You are viewing a plain text version of this content. The canonical link for it is here.

Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/01/20 05:23:55 UTC

[GitHub] [cassandra] dvohra commented on issue #419: Hints

dvohra commented on issue #419: Hints
URL: https://github.com/apache/cassandra/pull/419#issuecomment-576111592

Joey,
Thanks for reorganizing the Hints page with some additions.
The editor Google Docs and drawing tool Google Drawing were suggested or approved by project managers Nate and Dinesh (cced), and Google tools would be most appropriate as the project is sponsored by Google.
I shall make slight edits as suggested and the pull request has to be merged by someone else than myself.
regards,Deepak On Monday, January 20, 2020, 02:42:30 a.m. UTC, Joseph Lynch <no...@github.com> wrote:

@jolynch requested changes on this pull request.

Overall this is a great start.

I've left some comments and started a branch based off yours in dvohra/cassandra@hints...jolynch:hints with my suggestions. Feel free to pull them in or not.

General comments

- I'd recommend using https://www.mathcha.io/editor to make your diagrams instead of docs. It is free as well, and in my opinion easier and looks more professional (and it can export svgs). Also prefer svgs or pngs to jps.
- I don't see where the figures are used? Also if you like I can make matcha versions of them if you want.
- I think you can cut a lot of copy in this page, try to trim any sections you don't think are strictly neccesary to explain hints.
- Replace the use of === with --- unless you want them to be top level sections.

In doc/source/operating/hints.rst:
> +.. with the License. You may obtain a copy of the License at
+..
+.. http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.

Suggested re-wording to something like the following:
Hints are a data repair technique applied during write operations. When
replica nodes are unavailable to accept a mutation, either due to failure or
more commonly routine maintenance, coordinators attempting to write to those
replicas store temporary hints on their local filesystem for later application
to the unavailable replica. Hints are an important way to help reduce the
duration of data inconsistency between replicas as they replay quickly after
unavailable nodes return to the ring, however they are best effort and do not
guarantee eventual consistency like :ref:`anti-entropy repair <repair>` does.

In doc/source/operating/hints.rst:
> +.. http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.
+
+Hints are metadata associated with a mutation (a write or update) indicating that the mutation is not placed on a replica node (the target node) it is meant to be placed on because the node is temporarily unavailable, or is unresponsive. Hints are used to implement the eventual consistency guarantee that all updates are eventually received by all replicas and all replicas are eventually made consistent. When the replica node becomes available the hints are replayed on the node.

I'd slightly modify
Hints are used to implement the eventual consistency guarantee ...

to be
Hints are one of the primary ways Cassandra implements the eventual consistency guarantee ...

In doc/source/operating/hints.rst:
> +.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.
+
+Hints are metadata associated with a mutation (a write or update) indicating that the mutation is not placed on a replica node (the target node) it is meant to be placed on because the node is temporarily unavailable, or is unresponsive. Hints are used to implement the eventual consistency guarantee that all updates are eventually received by all replicas and all replicas are eventually made consistent. When the replica node becomes available the hints are replayed on the node.
+
+As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data to provide fault tolerance, high availability and durability. Cassandra partitions data across the cluster using consistent hashing in which a hash function is used on the partition keys to generate consistently ordered hash values (or tokens). An abstract ring represents the complete hash value range (token range) of the keys stored with each node in the cluster being assigned a certain subset range of hash values (range of tokens) it can store. The list of nodes responsible for a particular key is called its preference list. The preference list may include virtual nodes as a virtual node is also a node albeit an abstract node and not a physical node. Virtual nodes may need to be skipped to create a preference list in which the first N (N being the replication factor) nodes taken clockwise in the consistent hashing ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in the preference list for a given key. The node that receives a request for a write operation (key/value data) forwards the request to the replica node that is in the preference list for the key. The node becomes a coordinator node and coordinates the reads and writes.

General feedback: Can we link to one of the architecture pages here instead of repeating it?

Copy feedback (my opinion):

- I'd nix some of the expository copy like As a primer on how replicas are placed in a cluster,
- I don't think you need to go into virtual nodes to explain hints. There are a set of physical endpoints which should be part of the replica set for a key, and when an endpoint (or replica) is unavailable hints have to be stored for those.

In doc/source/operating/hints.rst:
> +=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.
+
+Hints are metadata associated with a mutation (a write or update) indicating that the mutation is not placed on a replica node (the target node) it is meant to be placed on because the node is temporarily unavailable, or is unresponsive. Hints are used to implement the eventual consistency guarantee that all updates are eventually received by all replicas and all replicas are eventually made consistent. When the replica node becomes available the hints are replayed on the node.
+
+As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data to provide fault tolerance, high availability and durability. Cassandra partitions data across the cluster using consistent hashing in which a hash function is used on the partition keys to generate consistently ordered hash values (or tokens). An abstract ring represents the complete hash value range (token range) of the keys stored with each node in the cluster being assigned a certain subset range of hash values (range of tokens) it can store. The list of nodes responsible for a particular key is called its preference list. The preference list may include virtual nodes as a virtual node is also a node albeit an abstract node and not a physical node. Virtual nodes may need to be skipped to create a preference list in which the first N (N being the replication factor) nodes taken clockwise in the consistent hashing ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in the preference list for a given key. The node that receives a request for a write operation (key/value data) forwards the request to the replica node that is in the preference list for the key. The node becomes a coordinator node and coordinates the reads and writes.
+
+Why are hints needed?
+=====================
+
+Hints reduce the inconsistency window caused by temporary node unavailability.
+
+Consider that an update or mutation is to be made using the following configuration:
+
+- Consistency level : 2

Consistency level: LOCAL_QUORUM (2/3)

Hints, like read-repair, are not an alternative to performing full repair, but do help reduce the duration of inconsistency between replicas

In doc/source/operating/hints.rst:
> +
+Hints
+=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.
+
+Hints are metadata associated with a mutation (a write or update) indicating that the mutation is not placed on a replica node (the target node) it is meant to be placed on because the node is temporarily unavailable, or is unresponsive. Hints are used to implement the eventual consistency guarantee that all updates are eventually received by all replicas and all replicas are eventually made consistent. When the replica node becomes available the hints are replayed on the node.
+
+As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data to provide fault tolerance, high availability and durability. Cassandra partitions data across the cluster using consistent hashing in which a hash function is used on the partition keys to generate consistently ordered hash values (or tokens). An abstract ring represents the complete hash value range (token range) of the keys stored with each node in the cluster being assigned a certain subset range of hash values (range of tokens) it can store. The list of nodes responsible for a particular key is called its preference list. The preference list may include virtual nodes as a virtual node is also a node albeit an abstract node and not a physical node. Virtual nodes may need to be skipped to create a preference list in which the first N (N being the replication factor) nodes taken clockwise in the consistent hashing ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in the preference list for a given key. The node that receives a request for a write operation (key/value data) forwards the request to the replica node that is in the preference list for the key. The node becomes a coordinator node and coordinates the reads and writes.
+
+Why are hints needed?
+=====================
+
+Hints reduce the inconsistency window caused by temporary node unavailability.
+
+Consider that an update or mutation is to be made using the following configuration:

Consider that a mutation is made with the following configuration

In doc/source/operating/hints.rst:
> +
+As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data to provide fault tolerance, high availability and durability. Cassandra partitions data across the cluster using consistent hashing in which a hash function is used on the partition keys to generate consistently ordered hash values (or tokens). An abstract ring represents the complete hash value range (token range) of the keys stored with each node in the cluster being assigned a certain subset range of hash values (range of tokens) it can store. The list of nodes responsible for a particular key is called its preference list. The preference list may include virtual nodes as a virtual node is also a node albeit an abstract node and not a physical node. Virtual nodes may need to be skipped to create a preference list in which the first N (N being the replication factor) nodes taken clockwise in the consistent hashing ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in the preference list for a given key. The node that receives a request for a write operation (key/value data) forwards the request to the replica node that is in the preference list for the key. The node becomes a coordinator node and coordinates the reads and writes.
+
+Why are hints needed?
+=====================
+
+Hints reduce the inconsistency window caused by temporary node unavailability.
+
+Consider that an update or mutation is to be made using the following configuration:
+
+- Consistency level : 2
+- Replication factor: 3
+- Replication strategy: SimpleStrategy
+- Number of nodes in cluster: 5
+
+The update or mutation is sent to a node (node A) in the cluster, and is meant to be forwarded to three other nodes, the replica nodes B, C and D. The node that receives the request is the proxy node and becomes the coordinator of the request. Under normal operation the update gets sent to the three replica nodes and the coordinator receives the response from the three nodes satisfying the consistency level. But suppose node B is down and unavailable. The update is sent to nodes C and D and a response returned to the coordinator, again satisfying the consistency level of 2. But that is not the end of the request. Because the replica mutation is meant for replica node B also, a hint is stored by the coordinator node in the local filesystem indicating that the update or mutation is also to be replicated on node B. The coordinator node waits for 3 hours by default (as set with ``max_hint_window_in_ms``). If node B becomes available within 3 hours the coordinator sends the hint to node B and the hint is replayed on node B, eventually making all replicas consistent. Such a transfer of an update using hints is called a hinted handoff. Hinted handoff is used to ensure that read and write operations are not failed and the consistency, availability and durability guarantees are not compromised. We still need to satisfy the consistency level, because hints & hinted handoffs are not used to satisfy the write consistency level unless the consistency level is ``ANY``. If the replica node for which a hint is generated does not become available within 3 hours, or the ``max_hint_window_in_ms``, the hint is deleted and a full or read repair becomes necessary.

A couple of suggestions:

- I'd omit replication strategy and number of nodes in the cluster for this example, the only thing we need to know is that we have a LOCAL_QUORUM request going to three replicas and one of the replicas does not acknowledge the write.
- are not failed and the consistency, availability and durability guarantees are not compromised. I suggest you re-word this to something like "hints ensure eventual consistency".
- Can you structure this as an ordered timeline with a diagram instead of a large paragraph? I think something like this diagram would help explain the concept.

In doc/source/operating/hints.rst:
> +
+Hints reduce the inconsistency window caused by temporary node unavailability.
+
+Consider that an update or mutation is to be made using the following configuration:
+
+- Consistency level : 2
+- Replication factor: 3
+- Replication strategy: SimpleStrategy
+- Number of nodes in cluster: 5
+
+The update or mutation is sent to a node (node A) in the cluster, and is meant to be forwarded to three other nodes, the replica nodes B, C and D. The node that receives the request is the proxy node and becomes the coordinator of the request. Under normal operation the update gets sent to the three replica nodes and the coordinator receives the response from the three nodes satisfying the consistency level. But suppose node B is down and unavailable. The update is sent to nodes C and D and a response returned to the coordinator, again satisfying the consistency level of 2. But that is not the end of the request. Because the replica mutation is meant for replica node B also, a hint is stored by the coordinator node in the local filesystem indicating that the update or mutation is also to be replicated on node B. The coordinator node waits for 3 hours by default (as set with ``max_hint_window_in_ms``). If node B becomes available within 3 hours the coordinator sends the hint to node B and the hint is replayed on node B, eventually making all replicas consistent. Such a transfer of an update using hints is called a hinted handoff. Hinted handoff is used to ensure that read and write operations are not failed and the consistency, availability and durability guarantees are not compromised. We still need to satisfy the consistency level, because hints & hinted handoffs are not used to satisfy the write consistency level unless the consistency level is ``ANY``. If the replica node for which a hint is generated does not become available within 3 hours, or the ``max_hint_window_in_ms``, the hint is deleted and a full or read repair becomes necessary.
+
+Hints for Timed Out Write Requests
+==================================
+
+Hints are also stored for write requests that are timed out. The ``write_request_timeout_in_ms`` setting in ``cassandra.yaml`` configures the timeout for write requests.

write requests that time out

In doc/source/operating/hints.rst:
> +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+=====
+
+Hints are a type of repair during a write operation. At times a write or an update cannot be replicated to all nodes satisfying the replication factor because a replica node is unavailable. Under such a condition the mutation (a write or update) is stored temporarily on the coordinator node in its filesystem.
+
+Hints are metadata associated with a mutation (a write or update) indicating that the mutation is not placed on a replica node (the target node) it is meant to be placed on because the node is temporarily unavailable, or is unresponsive. Hints are used to implement the eventual consistency guarantee that all updates are eventually received by all replicas and all replicas are eventually made consistent. When the replica node becomes available the hints are replayed on the node.
+
+As a primer on how replicas are placed in a cluster, Apache Cassandra replicates data to provide fault tolerance, high availability and durability. Cassandra partitions data across the cluster using consistent hashing in which a hash function is used on the partition keys to generate consistently ordered hash values (or tokens). An abstract ring represents the complete hash value range (token range) of the keys stored with each node in the cluster being assigned a certain subset range of hash values (range of tokens) it can store. The list of nodes responsible for a particular key is called its preference list. The preference list may include virtual nodes as a virtual node is also a node albeit an abstract node and not a physical node. Virtual nodes may need to be skipped to create a preference list in which the first N (N being the replication factor) nodes taken clockwise in the consistent hashing ring are all distinct physical nodes. All nodes in a cluster know which node/s should be in the preference list for a given key. The node that receives a request for a write operation (key/value data) forwards the request to the replica node that is in the preference list for the key. The node becomes a coordinator node and coordinates the reads and writes.
+
+Why are hints needed?

Change this from = to - headers, and perhaps re-word the title to Hinted Handoff.

In doc/source/operating/hints.rst:
> +| |uncompressed. LZ4, Snappy, and Deflate | |
+| |compressors are supported. | |
++----------------------+-------------------------------------------+-----------------+
+
+Changing Max Hint Window at Runtime
+===================================
+
+Cassandra 4.0 has added support for changing ``max_hint_window_in_ms`` at runtime
+(`CASSANDRA-11720
+<https://issues.apache.org/jira/browse/CASSANDRA-11720>`_). The ``max_hint_window_in_ms`` configuration property in ``cassandra.yaml`` may be modified at runtime followed by a rolling restart. The default value of ``max_hint_window_in_ms`` is 3 hours.
+
+::
+
+ max_hint_window_in_ms: 10800000 # 3 hours
+
+The need to be able to modify ``max_hint_window_in_ms`` at runtime is explained with the following example. A larger node (in terms of data it holds) goes down. And it will take slightly more than ``max_hint_window_in_ms`` to fix it. The disk space to store some additional hints id available.

This is not clear, let's re-word it.

In doc/source/operating/hints.rst:
> +| | |data/hints |
++----------------------+-------------------------------------------+-----------------+
+|hints_flush_period_in |How often hints should be flushed from the | 10000 |
+|_ms |internal buffers to disk. Will *not* | |
+| |trigger fsync. | |
++----------------------+-------------------------------------------+-----------------+
+|max_hints_file_size |Maximum size for a single hints file, in | 128 |
+|_in_mb |megabytes. | |
++----------------------+-------------------------------------------+-----------------+
+|hints_compression |Compression to apply to the hint files. | LZ4Compress |
+| |If omitted, hints files will be written | |
+| |uncompressed. LZ4, Snappy, and Deflate | |
+| |compressors are supported. | |
++----------------------+-------------------------------------------+-----------------+
+
+Changing Max Hint Window at Runtime

Can we change this section to talk about when you may want more time for hints to play instead of changing max hint window at runtime? It's actually somewhat rare for nodes to be down for more than three hours but its very common for hints playing at 1024 kbps cannot complete before 3 hours.

You could mention raising the hinted_handoff_throttle as well as raising the window to ensure hints are delivered.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org