You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by va...@apache.org on 2019/05/07 04:53:34 UTC
[couchdb-documentation] branch rfc/007-background-workers updated: RFC for CouchDB background workers

This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch rfc/007-background-workers
in repository https://gitbox.apache.org/repos/asf/couchdb-documentation.git


The following commit(s) were added to refs/heads/rfc/007-background-workers by this push:
     new c03238b  RFC for CouchDB background workers
c03238b is described below

commit c03238ba84fd8c3e879a7f4f16a71e0637f695e6
Author: Nick Vatamaniuc <va...@apache.org>
AuthorDate: Tue May 7 00:52:32 2019 -0400

    RFC for CouchDB background workers
---
 rfcs/007-background-workers.md | 252 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 252 insertions(+)

diff --git a/rfcs/007-background-workers.md b/rfcs/007-background-workers.md
new file mode 100644
index 0000000..b5346c7
--- /dev/null
+++ b/rfcs/007-background-workers.md
@@ -0,0 +1,252 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Background workers with FoundationDB backend'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model and behavior of CouchDB background workers.
+
+## Abstract
+
+CouchDB background workers are used for things like index building and
+replication. We present a generalized model that allows creation, running, and
+monitoring of these jobs. "Jobs" are represented generically such that both
+replication and indexing could take advantage of the same framework. The basic
+idea is that of a global job queue for each job type. New jobs are inserted
+into the jobs table and enqueued for execution.
+
+There are a number of workers that attempt to dequeue pending jobs and run
+them. "Running" is specific to each job type and would be different for
+replication and indexing. Workers are processes which execute jobs. The MAY be
+individual Erlang processes, but could also be implemented in Python, Java or
+any other environment with a FoundationDB client. The only coordination between
+workers happens via the database. Workers can start and stop at any time.
+Workers monitor each other for liveliness and in case some workers abruptly
+terminate, all the jobs of a dead worker are re-enqueued into the global
+pending queue.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+---
+
+`Job`: An unit of work, identified by a `JobId` and also having a `JobType`.
+
+`Job table`: A subspace holding the list of jobs indexed by `JobId`.
+
+`Pending queue`: A queue of jobs which are ready to run. Workers may pick jobs
+from this queue and start running them.
+
+`Active jobs`: A subspace holding the list of jobs currently run by a
+particular worker.
+
+`Worker`: A job execution unit. Workers could be individual processes or groups
+of processes running on remote nodes.
+
+`Health key` : A key that the worker periodically updates with a timestamp to
+indicate that they are "alive" and ready to process jobs.
+
+`Versionstamp`: a 12 byte, unique, monotonically (but not sequentially)
+increasing value for each committed transaction.
+
+---
+
+# Detailed Description
+
+## Data Model
+
+The main job table:
+ `("couch_workers", "jobs", JobType, JobId) = (JobState, WorkerId, Priority, CancelReq, JobInfo, JobOps)`
+
+Pending queue:
+ `("couch_workers", "pending", JobType, Priority, JobId) = ""`
+
+Active queue:
+ `("couch_workers", "active", WorkerType, Worker, JobId) = (JobState, JobInfo, JobOpts)`
+
+Worker registration and health:
+ `("couch_workers", WorkerType, "workers_vs") = Versionstamp`
+ `("couch_workers", WorkerType, "workers", WorkerId) = WorkerOpts`
+ `("couch_workers", WorkerType, "health", WorkerId) = (Versionstamp, Timestamp, WorkerTimeout)`
+
+Data model fields:
+
+- `JobType`: The job type. It MAY be ``replication`` or ``indexing``, however,
+  specific types are not in scope of this document.
+
+- `JobId` : MUST unique identify a job in a cluster. It MAY be a `UUID` or
+  derived from some other job parameters.
+
+- `JobState` : MUST be one of `pending`, `running`, or `completed`
+
+- `Priority` : MAY be any key which allows sorting jobs, such as a timestamp,
+  or a tag.
+
+- `WorkerId` : MUST uniquely identify a worker in a cluster. It MAY be
+  generated each time the worker is restarted, or it could be persisted across
+  worker restarts, as long as it is unique across the whole cluster.
+
+- `CancelReq` : A boolean flag indicating a request to the worker to cancel the
+  job.
+
+- `JobOps` : Type-specific job options. Represented as a JSON object. This MAY
+  contain fields like `"source"`, `"target"`, `"dbname"` etc.
+
+- `JobInfo` : A per-job info object. Represented as JSON. This object will
+  contain details about a job's current state. It MAY have fields such as
+  "update_seq", "error_count", etc.
+
+
+## Job Lifecyle
+
+New jobs are posted to the main jobs table along with an entry in the pending
+queue. Priority key assignment is type specific. `WorkerId` is set to `nil`
+initially.
+
+Workers will monitor the pending queue with their matching `JobType`. That is,
+`"replication"` workers will monitor the `"replication"` pending queue,
+indexing workers the `"indexing"` queue, etc. If there are jobs ready to run,
+they will attempt to grab one or more jobs from the pending queue and assign it
+to themselves for execution. They do that by setting the `WorkerId` in the jobs
+table entry to their worker ID, removing the job from pending queue and create
+a new entry in their own `"active"` jobs area.
+
+When a job finishes running, either because it was a successful completion or a
+terminal failure, the worker MUST remove the job from its active queue and
+clear its `WorkerId` field. Then, based on type-specific behavior, it MUST do
+either one of:
+
+ - Update the `JobState` to `"completed"` and leave the job in the jobs table.
+   It would be up to the job creation logic to inspect and remove it.
+ - Delete the job from the system altogether. This may be useful when running
+   _replicate jobs, for example.
+
+
+If a user decided to cancel a job that is running, they MUST toggle `CancelReq`
+to `true`. Then they MUST wait for the job to stop running. If the job is not
+running, they may directly remove the job from the pending and the main jobs
+table.
+
+
+## Worker Lifecyle
+
+Workers can be dynamically started and stopped. When workers start, they MUST
+register themselves in the `"workers"` subspace based on their `WorkerType`.
+Each change to the workers list, should be accompanied by updating the
+`"workers_vs"` versionstamp. That versionstamp MAY be efficiently monitored for
+changes using a watch.
+
+
+### Worker Liveliness Monitoring
+
+As long as the workers are active and ready to process jobs, they MUST
+periodically ping their health key. The period MUST be less than the
+`WorkerTimeout` value. Other worker nodes, for example, one of the "neighbors"
+(to left or to the right) of the worker, will monitor that health entry. If a
+worker dies unexpectedly, the neighbor MUST remove all of the dead worker's
+jobs from the active queue and submit them back into the pending queue.
+
+
+### Liveliness Race Condition
+
+In case workers come back alive after not updating their health entry often
+enough, and another worker had cleared and re-submitted they jobs, workers MUST
+monitor their active queue and if they detect that it was cleared or `WorkerId`
+was updated and they are no longer the current `WorkerId`, they MUST stop
+running the job and go back to the pending queue to pick new jobs to run.
+
+### Clean Worker Shutdown
+
+If a worker is stopped cleanly, they MUST stop accepting new jobs from the
+pending queue, stop running their jobs, and then resubmit those jobs back into
+the pending queue. Finally, they MUST clear their health entry and remove
+themselves from workers membership list.
+
+### User Job Cancellation
+
+If a user decided to a cancel an active job. They MUST update the `CancelReq`
+field to `true`. Workers should periodically monitor that field and if they
+field is true, they should stop running the job, update its jobs state and
+remove it from their active queue.
+
+## `/_active_tasks` API Compatibility
+
+The shape of the active queue was deliberately constructed with the hopes of
+using it to implement the `/_active_tasks` API endpoint.
+
+When the HTTP API is access, a single `get_range(...)` request for
+`("couch_workers", "active")` subspace MAY be used to generate the results.
+
+# Advantages and Disadvantages
+
+Another alternative discussed on the mailing list is to mimic the
+pre-FoundationDB replication job scheduling model, where replication jobs are
+distributed between all the workers immediately when they are created. Each
+worker has their own separate job queue. The advantage of that model is that
+the queues and jobs are sharded evenly amongst the worker avoiding potential
+hot key ranges around global queue pushes and pops. The disadvantage is that it
+seemed like premature optimization, and only allows implemented a completely
+fair scheduling algorithm as opposed to allowing workers to throttle the amount
+of jobs they pick up from the global queue based on load or other scheduling
+parameters.
+
+An alternative to health monitoring of individual workers is to sort jobs based
+on a `LastUpdated` timestamp field and then automatically re-enqueue jobs that
+have not been updated in a while. The disadvantage of that approach is that it
+even though it might simplify the data model, but it doesn't eliminate the need
+for having workers. Since workers are already present it might be useful to
+expose their state explicitly so it can be inspected and monitored along with
+the job queue.
+
+# Key Changes
+
+ - New job execution framework
+ - A single global job queue for each job type
+ - Workers can dynamically scale up or down
+ - Worker health monitoring to detect dead workers
+ - Potential `_active_tasks` API implementation
+
+## Applications and Modules affected
+
+Replication and indexing
+
+## HTTP API additions
+
+None.
+
+## HTTP API deprecations
+
+There might be potential incompatibilities in the results returned from
+`_active_tasks`, `_scheduler/jobs` and `_scheduler/docs` API endpoints.
+
+# Security Considerations
+
+None have been identified.
+
+# References
+
+[Original mailing list discussion](https://lists.apache.org/thread.html/9338bd50f39d7fdec68d7ab2441c055c166041bd84b403644f662735@%3Cdev.couchdb.apache.org%3E)
+
+# Acknowledgements
+
+ - @kocolosk
+ - @davisp
+ - @garrensmith
+ - @rnewson
+ - @mikerhodes
+