You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/06/04 13:06:41 UTC

[GitHub] [couchdb-documentation] garrensmith commented on a change in pull request #409: RFC for CouchDB background workers

garrensmith commented on a change in pull request #409: RFC for CouchDB background workers
URL: https://github.com/apache/couchdb-documentation/pull/409#discussion_r290285563
 
 

 ##########
 File path: rfcs/007-background-jobs.md
 ##########
 @@ -0,0 +1,350 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Background jobs with FoundationDB'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model, implementation, and an API for running
+CouchDB background jobs with FoundationDB.
+
+## Abstract
+
+CouchDB background jobs are used for things like index building, replication
+and couch-peruser processing. We present a generalized model which allows
+creation, running, and monitoring of these jobs.
+
+The document starts with a description of the framework API in Erlang
+pseudo-code, then we show the data model, followed by the implementation
+details.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+---
+
+`Job`: A unit of work, identified by a `JobId` and also having a `Type`.
+
+`Worker` : A language-specific execution unit that runs the job. Could be an
+Erlang process, a thread, or just a function.
+
+`Job table`: An FDB subspace holding the list of jobs.
+
+`Pending job`: A job that is waiting to run.
+
+`Pending queue` : A queue of pending jobs ordered by priority.
+
+`Running job`: A job which is currently executing. To be considered "running"
+the worker must periodically update the job's state in the global job table.
+
+`Priority`: A job's priority specifies its order in the pending queue. Priority
+can by any term that can be encoded as a key in the FoundationDB's tuple layer. The
+exact value of `Priority` is job type specific. It MAY be a rough timestamp, a
+`Sequence`, a list of tags, etc.
+
+`Job re-submission` : Re-submitting a job means putting a previously running
+job back into the pending queue.
+
+`Activity monitor` : Functionality implemented by the framework which checks
+job liveness (activity). If workers don't update their status often enough,
+activity monitor will re-enqueue their jobs as pending. This ensures jobs make
+progress even if some workers terminate unexpectedly.
+
+`JobState`: Describes the current state of the job. The possible values are
+`"running"`, `"pending"`, and `"finished"`. These are the minimal number of
+states needed to describe a job's behavior in respect to this framework. Each
+job type MAY have additional, type specific states, such as `"failed`",
+`"error"`, `"retrying"`, etc.
+
+`Sequence`: a 13 byte value formed by combining the current `Incarnation` of
+the database and the `Versionstamp` of the transaction. Sequences are
+monotonically increasing even when a database is relocated across FoundationDB
+clusters. See (RFC002) for a full explanation.
+
+---
+
+# Framework API
+
+This section describes the job creation and worker implementation APIs. It doesn't
+describe how the framework is implemented. The intended audience is CouchDB
+developers using this framework to implement background jobs for indexing,
+replication, and couch-peruser.
+
+Both the job creation and the worker implementation APIs use a `JobOpts` map to
+represent a job. It MAY also contain these top level fields:
+
+  * `"priority"` : The value of this field will contain the `Priority` value of
+    the job. `Priority` is job-type specific.
+  * `"data"`: An opaque object (map), from the framework's point of view,
+    containing job-type specific data. It MAY contain an update sequence, or an
+    error message, for example.
+  * `"cancel"` : Boolean field defaulting to `false`. If `true` indicates the
+    user intends to stop a job's execution.
+  * `"resubmit"` : Boolean field defaulting to `false`. If `true` indicates
+    the job should be re-submitted.
+
+### Job Creation API ###
+
+```
+add(Type, JobId, JobOpts) -> ok | {error, Error}
+```
+ - Add a job to be executed by a background worker.
+
+```
+remove(Type, JobId) -> ok | not_found
+```
+ - Remove a job. If it is running, it will be stopped, then it will be removed
+   from the job table.
+
+```
+resubmit(Type, JobId) -> ok | not_found
+```
+ - Indicates that the job should be re-submitted for execution.
+
+```
+get_job(Type, JobId) -> {ok, JobOpts, JobState}
+```
+ - Return `JobOpts` and the `JobState`. `JobState` value MAY be:
+  * `"pending"` : This job is pending.
+  * `"running"` : This job is currently running.
+  * `"finished"` : This job has finished running and is not pending.
+
+### Worker Implementation API
+
+This API is to be used when implementing workers for various job types. The general pattern
+is to call `accept()` from something like a job manager, then for each accepted
+job spawn a worker to execute it, and then resume calling `accept()` to get
+other jobs. When a job is running, the worker MUST periodically call `update()`
+to prevent the activity monitor from re-enqueueing it. When the worker decides to stop
+running a job, they MUST call `finish()` to indicate that the job has finished running.
+
+```
+accept(Type[, MaxPriority]) -> {ok, JobId, WorkerLockId} | not_found
+```
+
+ - Dequeue a job from the pending queue. `WorkerLockId` is a UUID indicating
 
 Review comment:
   If we renaming workers to job processing, we should probably rename this to JobLockId to be consistent. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services