You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by kx...@apache.org on 2013/07/24 14:24:46 UTC

[09/50] [abbrv] git commit: updated refs/heads/1781-reorganize-and-improve-docs to fa11c25

Add replication protocol definition.

COUCHDB-1824


Project: http://git-wip-us.apache.org/repos/asf/couchdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb/commit/ae9ceacc
Tree: http://git-wip-us.apache.org/repos/asf/couchdb/tree/ae9ceacc
Diff: http://git-wip-us.apache.org/repos/asf/couchdb/diff/ae9ceacc

Branch: refs/heads/1781-reorganize-and-improve-docs
Commit: ae9ceacc8fe3aa0fdd9b6ff2a201e4cd76487953
Parents: 4ecafeb
Author: Alexander Shorin <kx...@apache.org>
Authored: Tue Jul 23 22:41:28 2013 +0400
Committer: Alexander Shorin <kx...@apache.org>
Committed: Wed Jul 24 10:48:37 2013 +0400

----------------------------------------------------------------------
 share/doc/build/Makefile.am             |   3 +
 share/doc/src/replications/index.rst    |   1 +
 share/doc/src/replications/protocol.rst | 201 +++++++++++++++++++++++++++
 3 files changed, 205 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/couchdb/blob/ae9ceacc/share/doc/build/Makefile.am
----------------------------------------------------------------------
diff --git a/share/doc/build/Makefile.am b/share/doc/build/Makefile.am
index 1580f85..cb60fba 100644
--- a/share/doc/build/Makefile.am
+++ b/share/doc/build/Makefile.am
@@ -70,6 +70,7 @@ html_files = \
     html/_sources/config/proxying.txt \
     html/_sources/replications/index.txt \
     html/_sources/replications/intro.txt \
+    html/_sources/replications/protocol.txt \
     html/_sources/replications/replicator.txt \
     html/_sources/changelog.txt \
     html/_sources/changes.txt \
@@ -127,6 +128,7 @@ html_files = \
     html/config/proxying.html \
     html/replications/index.html \
     html/replications/intro.html \
+    html/replications/protocol.html \
     html/replications/replicator.html \
     html/changelog.html \
     html/changes.html \
@@ -182,6 +184,7 @@ src_files = \
     ../src/config/proxying.rst \
     ../src/replications/index.rst \
     ../src/replications/intro.rst \
+    ../src/replications/protocol.rst \
     ../src/replications/replicator.rst \
     ../src/changelog.rst \
     ../src/changes.rst \

http://git-wip-us.apache.org/repos/asf/couchdb/blob/ae9ceacc/share/doc/src/replications/index.rst
----------------------------------------------------------------------
diff --git a/share/doc/src/replications/index.rst b/share/doc/src/replications/index.rst
index b626bc6..940a29c 100644
--- a/share/doc/src/replications/index.rst
+++ b/share/doc/src/replications/index.rst
@@ -33,3 +33,4 @@ destination database.
 
    intro
    replicator
+   protocol

http://git-wip-us.apache.org/repos/asf/couchdb/blob/ae9ceacc/share/doc/src/replications/protocol.rst
----------------------------------------------------------------------
diff --git a/share/doc/src/replications/protocol.rst b/share/doc/src/replications/protocol.rst
new file mode 100644
index 0000000..bf478f7
--- /dev/null
+++ b/share/doc/src/replications/protocol.rst
@@ -0,0 +1,201 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _replication/protocol:
+
+============================
+CouchDB Replication Protocol
+============================
+
+The **CouchDB Replication protocol** is a protocol for synchronizing
+documents between 2 peers over HTTP/1.1.
+
+Language
+--------
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in :rfc:`2119`.
+
+
+Goals
+-----
+
+The CouchDB Replication protocol is a synchronization protocol for
+synchronizing documents between 2 peers over HTTP/1.1.
+
+In theory the CouchDB protocol can be used between products that
+implement it. However the reference implementation, written in Erlang_, is
+provided by the couch_replicator_ module available in Apache CouchDB.
+
+
+The CouchDB_ replication protocol is using the `CouchDB REST API
+<http://wiki.apache.org/couchdb/Reference>`_ and so is based on HTTP and
+the Apache CouchDB MVC Data model. The primary goal of this
+specification is to describe the CouchDB replication algorithm.
+
+
+Definitions
+-----------
+
+ID:
+    An identifier (could be an UUID) as described in :rfc:`4122`
+
+Sequence:
+    An ID provided by the changes feed. It can be numeric but not
+    necessarily.
+
+Revision:
+    (to define)
+
+Document
+    A document is JSON entity with a unique ID and revision.
+
+Database
+    A collection of documents with a unique URI
+
+URI
+    An uri is defined by the :rfc:`2396` . It can be an URL as defined
+    in :rfc:`1738`.
+
+Source
+    Database from where the Documents are replicated
+
+Target
+    Database where the Document are replicated
+
+Checkpoint
+    Last source sequence ID
+
+
+Algorithm
+---------
+
+1. Get unique identifiers for the Source and Target based on their URI if
+   replication task ID is not available.
+
+2. Save this identifier in a special Document named `_local/<uniqueid>`
+   on the Target database. This document isn't replicated. It will
+   collect the last Source sequence ID, the Checkpoint, from the
+   previous replication process.
+
+3. Get the Source changes feed by passing it the Checkpoint using the
+   `since` parameter by calling the `/<source>/_changes` URL. The
+   changes feed only return a list of current revisions.
+
+
+.. note::
+
+    This step can be done continuously using the `feed=longpoll` or
+    `feed=continuous` parameters. Then the feed will continuously get
+    the changes.
+
+
+4. Collect a group of Document/Revisions ID pairs from the **changes
+   feed** and send them to the target databases on the
+   `/<target>/_revs_diffs` URL. The result will contain the list of
+   revisions **NOT** in the Target.
+
+5. GET each revisions from the source Database by calling the URL
+   `/<source>/<docid>?revs=true&open_revs`=<revision>` . This
+   will get the document with teh parent revisions. Also don't forget to
+   get attachments that aren't already stored at the target. As an
+   optimisation you can use the HTTP multipart api to get all.
+
+6. Collect a group of revisions fetched at previous step and store them
+   on the target database using the `Bulk Docs
+   <http://wiki.apache.org/couchdb/HTTP_Document_API#Bulk_Docs>`_ API
+   with the `new_edit: false` JSON property to preserve their revisions
+   ID.
+
+7. After the group of revision is stored on the Target, save
+   the new Checkpoint on the Source.
+
+
+.. note::
+
+    - Even if some revisions have been ignored the sequence should be
+      take in consideration for the Checkpoint.
+
+    - To compare non numeric sequence ordering, you will have to keep an
+      ordered list of the sequences IDS as they appear in the _changes
+      feed and compare their indices.
+
+Filter replication
+------------------
+
+The replication can be filtered by passing the `filter` parameter to the
+changes feeds with a function name. This will call a function on each
+changes. If this function return True, the document will be added to the
+feed.
+
+
+Optimisations
+-------------
+
+- The system should run each steps in parallel to reduce the latency.
+
+- The number of revisions passed to the step 3 and 6 should be large
+  enough to reduce the bandwidth and make sure to reduce the latency.
+
+
+API Reference
+-------------
+
+- :ref:`api/db.head` -- Check Database existence
+- :ref:`api/db/ensure_full_commit` -- Ensure that all changes are stored on disk
+- :ref:`api/local/doc.get` -- Read the last Checkpoint
+- :ref:`api/local/doc.put` -- Save a new Checkpoint
+
+Push Only
+~~~~~~~~~
+
+- :ref:`api/db.put` -- Create Target if it not exists and option was provided
+- :ref:`api/db/revs_diff.post` -- Locate Revisions that are not known to the
+  Target
+- :ref:`api/db/bulk_docs.post` -- Upload Revisions to the Target
+- :ref:`api/doc.put`?new_edits=false -- Upload a single Document with
+  attachments to the Target
+
+Pull Only
+~~~~~~~~~
+
+- :ref:`api/db/changes.get` -- Locate changes since on Source the last pull.
+  The request uses next query parameters:
+
+  - ``style=all_docs``
+  - ``feed=feed`` , where feed is :ref:`normal <changes/normal>` or
+    :ref:`longpoll <changes/longpoll>`
+  - ``limit=limit``
+  - ``heartbeat=heartbeat``
+
+- :ref:`api/doc.get` -- Retrieve a single Document from Source with attachments.
+  The request uses next query parameters:
+
+  - ``open_revs=revid`` - where ``revid`` is the actual Document Revision at the
+    moment of the pull request
+  - ``revs=true``
+  - ``atts_since=lastrev``
+
+Reference
+---------
+
+* `TouchDB Ios wiki <https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm>`_
+* `CouchDB documentation
+  <http://wiki.apache.org/couchdb/Replication>`_
+* CouchDB `change notifications`_
+
+.. _CouchDB: http://couchdb.apache.org
+.. _Erlang: http://erlang.org
+.. _couch_replicator: https://github.com/apache/couchdb/tree/master/src/couch_replicator
+.. _change notifications: http://guide.couchdb.org/draft/notifications.html
+