You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by kx...@apache.org on 2013/07/24 14:24:50 UTC

[13/50] [abbrv] git commit: updated refs/heads/1781-reorganize-and-improve-docs to fa11c25

Add comparison of replication protocol with Git.

Thanks Jason Smith for the post:
http://stackoverflow.com/a/4766398/965635


Project: http://git-wip-us.apache.org/repos/asf/couchdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb/commit/c199cd9b
Tree: http://git-wip-us.apache.org/repos/asf/couchdb/tree/c199cd9b
Diff: http://git-wip-us.apache.org/repos/asf/couchdb/diff/c199cd9b

Branch: refs/heads/1781-reorganize-and-improve-docs
Commit: c199cd9b6392689213a7b85e722ebf9daa7cd476
Parents: 59b7d7f
Author: Alexander Shorin <kx...@apache.org>
Authored: Tue Jul 23 22:49:35 2013 +0400
Committer: Alexander Shorin <kx...@apache.org>
Committed: Wed Jul 24 10:48:37 2013 +0400

----------------------------------------------------------------------
 share/doc/src/replications/conflicts.rst | 85 +++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/couchdb/blob/c199cd9b/share/doc/src/replications/conflicts.rst
----------------------------------------------------------------------
diff --git a/share/doc/src/replications/conflicts.rst b/share/doc/src/replications/conflicts.rst
index bcd948d..404d256 100644
--- a/share/doc/src/replications/conflicts.rst
+++ b/share/doc/src/replications/conflicts.rst
@@ -699,3 +699,88 @@ size, but CouchDB discards them. If you are constantly updating a document,
 the size of a git repo would grow forever. It is possible (with some effort)
 to use "history rewriting" to make git forget commits earlier than a particular
 one.
+
+
+What is the CouchDB replication protocol? Is it like Git?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**Key points**
+
+**If you know Git, then you know how Couch replication works.** Replicating is
+*very* similar to pushing or pulling with distributed source managers like Git.
+
+**CouchDB replication does not have its own protocol.** A replicator simply
+connects to two DBs as a client, then reads from one and writes to the other.
+Push replication is reading the local data and updating the remote DB;
+pull replication is vice versa.
+
+* **Fun fact 1**: The replicator is actually an independent Erlang application,
+  in its own process. It connects to both couches, then reads records from one
+  and writes them to the other.
+* **Fun fact 2**: CouchDB has no way of knowing who is a normal client and who
+  is a replicator (let alone whether the replication is push or pull).
+  It all looks like client connections. Some of them read records. Some of them
+  write records.
+
+**Everything flows from the data model**
+
+The replication algorithm is trivial, uninteresting. A trained monkey could
+design it. It's simple because the cleverness is the data model, which has these
+useful characteristics:
+
+#. Every record in CouchDB is completely independent of all others. That sucks
+   if you want to do a JOIN or a transaction, but it's awesome if you want to
+   write a replicator. Just figure out how to replicate one record, and then
+   repeat that for each record.
+#. Like Git, records have a linked-list revision history. A record's revision ID
+   is the checksum of its own data. Subsequent revision IDs are checksums of:
+   the new data, plus the revision ID of the previous.
+
+#. In addition to application data (``{"name": "Jason", "awesome": true}``),
+   every record stores the evolutionary timeline of all previous revision IDs
+   leading up to itself.
+
+   - Exercise: Take a moment of quiet reflection. Consider any two different
+     records, A and B. If A's revision ID appears in B's timeline, then B
+     definitely evolved from A. Now consider Git's fast-forward merges.
+     Do you hear that? That is the sound of your mind being blown.
+
+#. Git isn't really a linear list. It has forks, when one parent has multiple
+   children. CouchDB has that too.
+
+   - Exercise: Compare two different records, A and B. A's revision ID does not
+     appear in B's timeline; however, one revision ID, C, is in both A's and B's
+     timeline. Thus A didn't evolve from B. B didn't evolve from A. But rather,
+     A and B have a common ancestor C. In Git, that is a "fork." In CouchDB,
+     it's a "conflict."
+
+   - In Git, if both children go on to develop their timelines independently,
+     that's cool. Forks totally support that.
+   - In CouchDB, if both children go on to develop their timelines
+     independently, that cool too. Conflicts totally support that.
+   - **Fun fact 3**: CouchDB "conflicts" do not correspond to Git "conflicts."
+     A Couch conflict is a divergent revision history, what Git calls a "fork."
+     For this reason the CouchDB community pronounces "conflict" with a silent
+     `n`: "co-flicked."
+
+#. Git also has merges, when one child has multiple parents. CouchDB *sort* of
+   has that too.
+
+   - **In the data model, there is no merge.** The client simply marks one
+     timeline as deleted and continues to work with the only extant timeline.
+   - **In the application, it feels like a merge.** Typically, the client merges
+     the *data* from each timeline in an application-specific way.
+     Then it writes the new data to the timeline. In Git, this is like copying
+     and pasting the changes from branch A into branch B, then commiting to
+     branch B and deleting branch A. The data was merged, but there was no
+     `git merge`.
+   - These behaviors are different because, in Git, the timeline itself is
+     important; but in CouchDB, the data is important and the timeline is
+     incidental—it's just there to support replication. That is one reason why
+     CouchDB's built-in revisioning is inappropriate for storing revision data
+     like a wiki page.
+
+**Final notes**
+
+At least one sentence in this writeup (possibly this one) is complete BS.
+