You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2022/05/12 11:13:09 UTC

[GitHub] [cassandra-website] adelapena commented on a diff in pull request #128: CASSANDRA-17621 May 2022 blog "Apache Cassandra 4.1 Features: Guardrails Framework"

adelapena commented on code in PR #128:
URL: https://github.com/apache/cassandra-website/pull/128#discussion_r871248911


##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-Features-Guardrails-Framework.adoc:
##########
@@ -0,0 +1,134 @@
+= Apache Cassandra 4.1 Features: Guardrails Framework
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: May 12, 2022
+:page-post-author: Andrés de la Peña
+:description: New Guardrails Framework in Apache Cassandra 4.1
+:keywords: 4.1, features, guardrails
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@jjying[JJ Ying on Unsplash^]
+image::blog/apache-cassandra-4.1-features-guardrails-framework-unsplash-jj-ying.jpg[New Guardrails framework]
+
+In Apache Cassandra 4.1.0, we are introducing a new framework called Guardrails. The framework helps operators avoid certain configuration and usage pitfalls that can degrade the performance and availability of an Apache Cassandra cluster when taken to scale. 
+
+For example, on the schema side, users can create too many tables or secondary indexes, leading to excessive use of resources. On the query side, users can run queries touching too many partitions that might involve all nodes in the cluster. Even worse, they can simply run a query using costly replica-side filtering, potentially reading all the table contents into memory on all nodes across the cluster. All these are well-known Cassandra anti-patterns, and administrators have to be vigilant about preventing users from incurring them. Even if one is perfectly aware of correct usage patterns, it’s easy to lose track of things like the size of non-frozen collections.
+
+The new framework allows operators to restrict how Cassandra is used by:
+
+* Disabling certain features.
+* Disallowing some specific values.
+* Defining soft and hard limits to certain database magnitudes.
+
+=== Configuring Guardrails
+
+Guardrails are defined as regular properties in the Cassandra configuration file, https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html[`cassandra.yaml`]. They look like:

Review Comment:
   This link goes to the project's documentation, so I wonder if it should be ``link:/doc/latest/assandra/configuration/cass_yaml_file.html[`cassandra.yaml`]``, without the `https://cassandra.apache.org` prefix:
   ```suggestion
   Guardrails are defined as regular properties in the Cassandra configuration file, /doc/latest/cassandra/configuration/cass_yaml_file.html[`cassandra.yaml`]. They look like:
   ```



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-Features-Guardrails-Framework.adoc:
##########
@@ -0,0 +1,134 @@
+= Apache Cassandra 4.1 Features: Guardrails Framework
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: May 12, 2022
+:page-post-author: Andrés de la Peña
+:description: New Guardrails Framework in Apache Cassandra 4.1
+:keywords: 4.1, features, guardrails
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@jjying[JJ Ying on Unsplash^]
+image::blog/apache-cassandra-4.1-features-guardrails-framework-unsplash-jj-ying.jpg[New Guardrails framework]
+
+In Apache Cassandra 4.1.0, we are introducing a new framework called Guardrails. The framework helps operators avoid certain configuration and usage pitfalls that can degrade the performance and availability of an Apache Cassandra cluster when taken to scale. 
+
+For example, on the schema side, users can create too many tables or secondary indexes, leading to excessive use of resources. On the query side, users can run queries touching too many partitions that might involve all nodes in the cluster. Even worse, they can simply run a query using costly replica-side filtering, potentially reading all the table contents into memory on all nodes across the cluster. All these are well-known Cassandra anti-patterns, and administrators have to be vigilant about preventing users from incurring them. Even if one is perfectly aware of correct usage patterns, it’s easy to lose track of things like the size of non-frozen collections.
+
+The new framework allows operators to restrict how Cassandra is used by:
+
+* Disabling certain features.
+* Disallowing some specific values.
+* Defining soft and hard limits to certain database magnitudes.
+
+=== Configuring Guardrails
+
+Guardrails are defined as regular properties in the Cassandra configuration file, https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html[`cassandra.yaml`]. They look like:
+
+```
+tables_warn_threshold: -1
+tables_fail_threshold: -1
+secondary_indexes_per_table_warn_threshold: -1
+secondary_indexes_per_table_fail_threshold: -1
+allow_filtering_enabled: true
+partition_keys_in_select_warn_threshold: -1
+partition_keys_in_select_fail_threshold: -1
+collection_size_warn_threshold:
+collection_size_fail_threshold:
+```
+
+Note that this is not an exhaustive list of all the available guardrails. There are many more, and new ones are under development, but this does give you an idea of the potential options. Note also that all guardrails are disabled by default. When enabled, a guardrail configuration might resemble the following: 
+
+```
+tables_warn_threshold: 5
+tables_fail_threshold: 10
+secondary_indexes_per_table_warn_threshold: 5
+secondary_indexes_per_table_fail_threshold: 10
+allow_filtering_enabled: false
+partition_keys_in_select_warn_threshold: 10
+partition_keys_in_select_fail_threshold: 20
+collection_size_warn_threshold: 10MiB
+collection_size_fail_threshold: 20MiB
+```
+
+The guardrails defined in `cassandra.yaml` are applied as the node starts. It’s also possible to dynamically update the guardrails configuration through JMX at runtime. All guardrails are grouped under the MBean named `org.apache.cassandra.db.Guardrails`. There are plans to also support dynamically updating guardrails through virtual tables, although this option is not yet available.
+
+=== Guardrails in Action
+
+Most guardrails are checked at the https://cassandra.apache.org/doc/latest/cassandra/cql/index.html[CQL layer], without involving the https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html[storage engine] nor https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html[additional replicas]. Boolean guardrails for disabling features, such as `allow_filtering_enabled`, abort the operations attempting to use the disabled feature. For example, if the boolean guardrail for queries using filtering is disabled (`allow_filtering_enabled: false`) we will see a failure every time we try to run one of those queries, and the query won’t run:

Review Comment:
   I think we should use relative paths for the links to our own doc:
   ```suggestion
   Most guardrails are checked at the /doc/latest/cassandra/cql/index.html[CQL layer], without involving the doc/latest/cassandra/architecture/storage_engine.html[storage engine] nor /doc/latest/cassandra/architecture/dynamo.html[additional replicas]. Boolean guardrails for disabling features, such as `allow_filtering_enabled`, abort the operations attempting to use the disabled feature. For example, if the boolean guardrail for queries using filtering is disabled (`allow_filtering_enabled: false`) we will see a failure every time we try to run one of those queries, and the query won’t run:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org