You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by ru...@apache.org on 2020/03/03 23:59:22 UTC

[cassandra] branch trunk updated: Added documentation for transient replication.

This is an automated email from the ASF dual-hosted git repository.

rustyrazorblade pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
     new 9688d7d  Added documentation for transient replication.
9688d7d is described below

commit 9688d7d490ba6581c203c229b945cca5b7c657f4
Author: dvohra <dv...@yahoo.com>
AuthorDate: Mon Jan 6 18:16:44 2020 -0800

    Added documentation for transient replication.
    
    Patch by dvohra; Reviewed by Jon Haddad for CASSANDRA-15476.
---
 doc/source/_templates/indexcontent.html |   2 +-
 doc/source/new/index.rst                |   1 +
 doc/source/new/transientreplication.rst | 155 ++++++++++++++++++++++++++++++++
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/doc/source/_templates/indexcontent.html b/doc/source/_templates/indexcontent.html
index 7f7eede..5851258 100644
--- a/doc/source/_templates/indexcontent.html
+++ b/doc/source/_templates/indexcontent.html
@@ -1,6 +1,6 @@
 {% extends "defindex.html" %}
 {% block tables %}
-<div id="wipwarning">This documentation is currently a work-in-progress and contains a number of TODO sections.
+<div id="wipwarning">This documentation is a work-in-progress.
     <a href="{{ pathto("bugs") }}">Contributions</a> are welcome.</div>
 
 <h3>Main documentation</h3>
diff --git a/doc/source/new/index.rst b/doc/source/new/index.rst
index c88120d..5ef867b 100644
--- a/doc/source/new/index.rst
+++ b/doc/source/new/index.rst
@@ -28,4 +28,5 @@ This section covers the new features in Apache Cassandra 4.0.
    fqllogging
    messaging
    streaming
+   transientreplication
    
diff --git a/doc/source/new/transientreplication.rst b/doc/source/new/transientreplication.rst
new file mode 100644
index 0000000..438f437
--- /dev/null
+++ b/doc/source/new/transientreplication.rst
@@ -0,0 +1,155 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Transient Replication
+---------------------
+
+**Note**:
+
+Transient Replication (`CASSANDRA-14404
+<https://issues.apache.org/jira/browse/CASSANDRA-14404>`_) is an experimental feature designed for expert Apache Cassandra users who are able to validate every aspect of the database for their application and deployment.
+That means being able to check that operations like reads, writes, decommission, remove, rebuild, repair, and replace all work with your queries, data, configuration, operational practices, and availability requirements.
+Apache Cassandra 4.0 has the initial implementation of transient replication. Future releases of Cassandra will make this feature suitable for a wider audience.
+It is anticipated that a future version will support monotonic reads with transient replication as well as LWT, logged batches, and counters. Being experimental, Transient replication is **not** recommended for production use.
+
+Objective
+^^^^^^^^^
+
+The objective of transient replication is to decouple storage requirements from data redundancy (or consensus group size) using incremental repair, in order to reduce storage overhead.
+Certain nodes act as full replicas (storing all the data for a given token range), and some nodes act as transient replicas, storing only unrepaired data for the same token ranges.
+
+The optimization that is made possible with transient replication is called "Cheap quorums", which implies that data redundancy is increased without corresponding increase in storage usage.
+
+Transient replication is useful when sufficient full replicas are unavailable to receive and store all the data.
+Transient replication allows you to configure a subset of replicas to only replicate data that hasn't been incrementally repaired.
+As an optimization, we can avoid writing data to a transient replica if we have successfully written data to the full replicas.
+
+After incremental repair, transient data stored on transient replicas can be discarded.
+
+Enabling Transient Replication
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Transient replication is not enabled by default.  Transient replication must be enabled on each node in a cluster separately by setting the following configuration property in ``cassandra.yaml``.
+
+::
+
+ enable_transient_replication: true
+
+Transient replication may be configured with both ``SimpleStrategy`` and ``NetworkTopologyStrategy``. Transient replication is configured by setting replication factor as ``<total_replicas>/<transient_replicas>``.
+
+As an example, create a keyspace with replication factor (RF) 3.
+
+::
+
+ CREATE KEYSPACE CassandraKeyspaceSimple WITH replication = {'class': 'SimpleStrategy',
+ 'replication_factor' : 4/1};
+
+
+As another example, ``some_keysopace keyspace`` will have 3 replicas in DC1, 1 of which is transient, and 5 replicas in DC2, 2 of which are transient:
+
+::
+
+ CREATE KEYSPACE some_keysopace WITH replication = {'class': 'NetworkTopologyStrategy',
+ 'DC1' : '3/1'', 'DC2' : '5/2'};
+
+Transiently replicated keyspaces only support tables with ``read_repair`` set to ``NONE``.
+
+Important Restrictions:
+
+- RF cannot be altered while some endpoints are not in a normal state (no range movements).
+- You can't add full replicas if there are any transient replicas. You must first remove all transient replicas, then change the # of full replicas, then add back the transient replicas.
+- You can only safely increase number of transients one at a time with incremental repair run in between each time.
+
+
+Additionally, transient replication cannot be used for:
+
+- Monotonic Reads
+- Lightweight Transactions (LWTs)
+- Logged Batches
+- Counters
+- Keyspaces using materialized views
+- Secondary indexes (2i)
+
+Cheap Quorums
+^^^^^^^^^^^^^
+
+Cheap quorums are a set of optimizations on the write path to avoid writing to transient replicas unless sufficient full replicas are not available to satisfy the requested consistency level.
+Hints are never written for transient replicas.  Optimizations on the read path prefer reading from transient replicas.
+When writing at quorum to a table configured to use transient replication the quorum will always prefer available full
+replicas over transient replicas so that transient replicas don't have to process writes. Tail latency is reduced by
+rapid write protection (similar to rapid read protection) when full replicas are slow or unavailable by sending writes
+to transient replicas. Transient replicas can serve reads faster as they don't have to do anything beyond bloom filter
+checks if they have no data. With vnodes and large cluster sizes they will not have a large quantity of data
+even for failure of one or more full replicas where transient replicas start to serve a steady amount of write traffic
+for some of their transiently replicated ranges.
+
+Speculative Write Option
+^^^^^^^^^^^^^^^^^^^^^^^^
+The ``CREATE TABLE`` adds an option ``speculative_write_threshold`` for  use with transient replicas. The option is of type ``simple`` with default value as ``99PERCENTILE``. When replicas are slow or unresponsive  ``speculative_write_threshold`` specifies the threshold at which a cheap quorum write will be upgraded to include transient replicas.
+
+
+Pending Ranges and Transient Replicas
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pending ranges refers to the movement of token ranges between transient replicas. When a transient range is moved, there
+will be a period of time where both transient replicas would need to receive any write intended for the logical
+transient replica so that after the movement takes effect a read quorum is able to return a response. Nodes are *not*
+temporarily transient replicas during expansion. They stream data like a full replica for the transient range before they
+can serve reads. A pending state is incurred similar to how there is a pending state for full replicas. Transient replicas
+also always receive writes when they are pending. Pending transient ranges are sent a bit more data and reading from
+them is avoided.
+
+
+Read Repair and Transient Replicas
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Read repair never attempts to repair a transient replica. Reads will always include at least one full replica.
+They should also prefer transient replicas where possible. Range scans ensure the entire scanned range performs
+replica selection that satisfies the requirement that every range scanned includes one full replica. During incremental
+& validation repair handling, at transient replicas anti-compaction does not output any data for transient ranges as the
+data will be dropped after repair, and  transient replicas never have data streamed to them.
+
+
+Transitioning between Full Replicas and Transient Replicas
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The additional state transitions that transient replication introduces requires streaming and ``nodetool cleanup`` to
+behave differently.  When data is streamed it is ensured that it is streamed from a full replica and not a transient replica.
+
+Transitioning from not replicated to transiently replicated means that a node must stay pending until the next incremental
+repair completes at which point the data for that range is known to be available at full replicas.
+
+Transitioning from transiently replicated to fully replicated requires streaming from a full replica and is identical
+to how data is streamed when transitioning from not replicated to replicated. The transition is managed so the transient
+replica is not read from as a full replica until streaming completes. It can be used immediately for a write quorum.
+
+Transitioning from fully replicated to transiently replicated requires cleanup to remove repaired data from the transiently
+replicated range to reclaim space. It can be used immediately for a write quorum.
+
+Transitioning from transiently replicated to not replicated requires cleanup to be run to remove the formerly transiently replicated data.
+
+When transient replication is in use ring changes are supported including   add/remove node, change RF, add/remove DC.
+
+
+Transient Replication supports EACH_QUORUM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+(`CASSANDRA-14727
+<https://issues.apache.org/jira/browse/CASSANDRA-14727>`_) adds support for Transient Replication support for ``EACH_QUORUM``. Per (`CASSANDRA-14768
+<https://issues.apache.org/jira/browse/CASSANDRA-14768>`_), we ensure we write to at least a ``QUORUM`` of nodes in every DC,
+regardless of how many responses we need to wait for and our requested consistency level. This is to minimally surprise
+users with transient replication; with normal writes, we soft-ensure that we reach ``QUORUM`` in all DCs we are able to,
+by writing to every node; even if we don't wait for ACK, we have in both cases sent sufficient messages.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org