You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by sl...@apache.org on 2016/06/27 18:34:00 UTC
[05/34] cassandra git commit: Fill in Replication, Tuneable Consistency sections

Fill in Replication, Tuneable Consistency sections


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1edbd12
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1edbd12
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1edbd12

Branch: refs/heads/trunk
Commit: b1edbd12146b483516eaf3a90745ac664f46d609
Parents: 62e3d7d
Author: Tyler Hobbs <ty...@gmail.com>
Authored: Wed Jun 15 17:23:11 2016 -0500
Committer: Sylvain Lebresne <sy...@datastax.com>
Committed: Thu Jun 16 12:23:52 2016 +0200

----------------------------------------------------------------------
 doc/source/architecture.rst | 96 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1edbd12/doc/source/architecture.rst
----------------------------------------------------------------------
diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst
index 37a0027..3f8a8ca 100644
--- a/doc/source/architecture.rst
+++ b/doc/source/architecture.rst
@@ -43,12 +43,104 @@ Token Ring/Ranges
 Replication
 ^^^^^^^^^^^
 
-.. todo:: todo
+The replication strategy of a keyspace determines which nodes are replicas for a given token range. The two main
+replication strategies are :ref:`simple-strategy` and :ref:`network-topology-strategy`.
+
+.. _simple-strategy:
+
+SimpleStrategy
+~~~~~~~~~~~~~~
+
+SimpleStrategy allows a single integer ``replication_factor`` to be defined. This determines the number of nodes that
+should contain a copy of each row.  For example, if ``replication_factor`` is 3, then three different nodes should store
+a copy of each row.
+
+SimpleStrategy treats all nodes identically, ignoring any configured datacenters or racks.  To determine the replicas
+for a token range, Cassandra iterates through the tokens in the ring, starting with the token range of interest.  For
+each token, it checks whether the owning node has been added to the set of replicas, and if it has not, it is added to
+the set.  This process continues until ``replication_factor`` distinct nodes have been added to the set of replicas.
+
+.. _network-topology-strategy:
+
+NetworkTopologyStrategy
+~~~~~~~~~~~~~~~~~~~~~~~
+
+NetworkTopologyStrategy allows a replication factor to be specified for each datacenter in the cluster.  Even if your
+cluster only uses a single datacenter, NetworkTopologyStrategy should be prefered over SimpleStrategy to make it easier
+to add new physical or virtual datacenters to the cluster later.
+
+In addition to allowing the replication factor to be specified per-DC, NetworkTopologyStrategy also attempts to choose
+replicas within a datacenter from different racks.  If the number of racks is greater than or equal to the replication
+factor for the DC, each replica will be chosen from a different rack.  Otherwise, each rack will hold at least one
+replica, but some racks may hold more than one. Note that this rack-aware behavior has some potentially `surprising
+implications <https://issues.apache.org/jira/browse/CASSANDRA-3810>`_.  For example, if there are not an even number of
+nodes in each rack, the data load on the smallest rack may be much higher.  Similarly, if a single node is bootstrapped
+into a new rack, it will be considered a replica for the entire ring.  For this reason, many operators choose to
+configure all nodes on a single "rack".
 
 Tunable Consistency
 ^^^^^^^^^^^^^^^^^^^
 
-.. todo:: todo
+Cassandra supports a per-operation tradeoff between consistency and availability through *Consistency Levels*.
+Essentially, an operation's consistency level specifies how many of the replicas need to respond to the coordinator in
+order to consider the operation a success.
+
+The following consistency levels are available:
+
+``ONE``
+  Only a single replica must respond.
+
+``TWO``
+  Two replicas must respond.
+
+``THREE``
+  Three replicas must respond.
+
+``QUORUM``
+  A majority (n/2 + 1) of the replicas must respond.
+
+``ALL``
+  All of the replicas must respond.
+
+``LOCAL_QUORUM``
+  A majority of the replicas in the local datacenter (whichever datacenter the coordinator is in) must respond.
+
+``EACH_QUORUM``
+  A majority of the replicas in each datacenter must respond.
+
+``LOCAL_ONE``
+  Only a single replica must respond.  In a multi-datacenter cluster, this also gaurantees that read requests are not
+  sent to replicas in a remote datacenter.
+
+``ANY``
+  A single replica may respond, or the coordinator may store a hint. If a hint is stored, the coordinator will later
+  attempt to replay the hint and deliver the mutation to the replicas.  This consistency level is only accepted for
+  write operations.
+
+Write operations are always sent to all replicas, regardless of consistency level. The consistency level simply
+controls how many responses the coordinator waits for before responding to the client.
+
+For read operations, the coordinator generally only issues read commands to enough replicas to satisfy the consistency
+level. There are a couple of exceptions to this:
+
+- Speculative retry may issue a redundant read request to an extra replica if the other replicas have not responded
+  within a specified time window.
+- Based on ``read_repair_chance`` and ``dclocal_read_repair_chance`` (part of a table's schema), read requests may be
+  randomly sent to all replicas in order to repair potentially inconsistent data.
+
+Picking Consistency Levels
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is common to pick read and write consistency levels that are high enough to overlap, resulting in "strong"
+consistency.  This is typically expressed as ``W + R > RF``, where ``W`` is the write consistency level, ``R`` is the
+read consistency level, and ``RF`` is the replication factor.  For example, if ``RF = 3``, a ``QUORUM`` request will
+require responses from at least two of the three replicas.  If ``QUORUM`` is used for both writes and reads, at least
+one of the replicas is guaranteed to participate in *both* the write and the read request, which in turn guarantees that
+the latest write will be read. In a multi-datacenter environment, ``LOCAL_QUORUM`` can be used to provide a weaker but
+still useful guarantee: reads are guaranteed to see the latest write from within the same datacenter.
+
+If this type of strong consistency isn't required, lower consistency levels like ``ONE`` may be used to improve
+throughput, latency, and availability.
 
 Storage Engine
 --------------