You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by GitBox <gi...@apache.org> on 2021/03/01 15:14:01 UTC

[GitHub] [kafka-site] miguno opened a new pull request #334: KAFKA-12393: Document multi-tenancy considerations

miguno opened a new pull request #334:
URL: https://github.com/apache/kafka-site/pull/334


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585640800



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.

Review comment:
       Ack and updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#issuecomment-788089637


   cc to committer @rajinisivaram as the SME on this subject, and who has the most context


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] dajac commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

dajac commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585427746



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.

Review comment:
       I am not sure to get your point here. While I do agree that disabling the auto topic creation is a good thing, users/apps can still create topics with the admin client so it does not really help to enforce a topic naming structure. In both cases, the topics would have to respect the ACLs in place and the "namespace" if defined.
   
   

##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>
+  </ol>
+
+  <p>
+  When securing a multi-tenant Kafka environment, the most common administrative task is the third category (authorization), i.e., managing the user/client permissions that grant or deny access to certain topics and thus to the data stored by users within a cluster. This task is performed predominantly through the <a href="#security_authz">setting of access control lists (ACLs)</a>. Here, administrators of multi-tenant environments in particular benefit from putting a hierarchical topic naming structure in place as described in a previous section, because they can conveniently control access to topics through prefixed ACLs (<code>--resource-pattern-type Prefixed</code>). This significantly minimizes the administrative overhead of securing topics in multi-tenant environments: administrators can make their own trade-offs between higher developer convenience (more lenient permissions, using fewer and broader ACLs) vs. tighter security (more stringent permissions, using more and narrowe
 r ACLs).
+  </p>
+
+  <p>
+    In the following example, user Alice—a new member of ACME corporation's InfoSec team—is granted write permissions to all topics whose names start with "acme.infosec.", such as "acme.infosec.telemetry.logins" and "acme.infosec.syslogs.events".
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Grant permissions to user Alice
+$ bin/kafka-acls.sh \
+    --bootstrap-server broker1:9092 \
+    --add --allow-principal User:Alice \
+    --producer \
+    --resource-pattern-type prefixed --topic acme.infosec.
+</code></pre>
+
+  <p>
+    You can similarly use this approach to isolate different customers on the same shared cluster.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-isolation" class="anchor-link"></a><a href="#multitenancy-isolation">Isolating Tenants: Quotas, Rate Limiting, Throttling</a></h4>
+
+  <p>
+  Multi-tenant clusters should generally be configured with <a href="#design_quotas">quotas</a>, which protect against users (tenants) eating up too many cluster resources, such as when they attempt to write or read very high volumes of data, or create requests to brokers at an excessively high rate. This may cause network saturation, monopolize broker resources, and impact other clients—all of which you want to avoid in a shared environment.
+  </p>
+
+  <p>
+    <strong>Client quotas:</strong> Kafka supports different types of (per-user principal) client quotas. Because a client's quotas apply irrespective of which topics the client is writing to or reading from, they are a convenient and effective tool to allocate resources in a multi-tenant cluster. <a href="#design_quotascpu">Request rate quotas</a>, for example, help to limit a user's impact on broker CPU usage by limiting the time a broker spends on the <a href="/protocol.html">request handling path</a> for that user, after which throttling kicks in. In many situations, isolating users with request rate quotas has a bigger impact in multi-tenant clusters than setting incoming/outgoing network bandwidth quotas, because excessive broker CPU usage for processing requests reduces the effective bandwidth the broker can serve. Furthermore, administrators can also define <a href="#brokerconfigs_controller.quota.window.num">quotas on topic operations</a> such as create, delete, and alter t
 o prevent Kafka clusters from being overwhelmed by highly concurrent topic operations (see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations">KIP-599</a>).

Review comment:
       `#brokerconfigs_controller.quota.window.num">` is not the correct config to highlight. `controller_mutations_rate` is more appropriate. We don't have an anchor for it though. I suggest to remove it for now.
   
   I need to add more documentation about the controller quota. We can add a link to it here when it is done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585572446



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>
+  </ol>
+
+  <p>
+  When securing a multi-tenant Kafka environment, the most common administrative task is the third category (authorization), i.e., managing the user/client permissions that grant or deny access to certain topics and thus to the data stored by users within a cluster. This task is performed predominantly through the <a href="#security_authz">setting of access control lists (ACLs)</a>. Here, administrators of multi-tenant environments in particular benefit from putting a hierarchical topic naming structure in place as described in a previous section, because they can conveniently control access to topics through prefixed ACLs (<code>--resource-pattern-type Prefixed</code>). This significantly minimizes the administrative overhead of securing topics in multi-tenant environments: administrators can make their own trade-offs between higher developer convenience (more lenient permissions, using fewer and broader ACLs) vs. tighter security (more stringent permissions, using more and narrowe
 r ACLs).
+  </p>
+
+  <p>
+    In the following example, user Alice—a new member of ACME corporation's InfoSec team—is granted write permissions to all topics whose names start with "acme.infosec.", such as "acme.infosec.telemetry.logins" and "acme.infosec.syslogs.events".
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Grant permissions to user Alice
+$ bin/kafka-acls.sh \
+    --bootstrap-server broker1:9092 \
+    --add --allow-principal User:Alice \
+    --producer \
+    --resource-pattern-type prefixed --topic acme.infosec.
+</code></pre>
+
+  <p>
+    You can similarly use this approach to isolate different customers on the same shared cluster.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-isolation" class="anchor-link"></a><a href="#multitenancy-isolation">Isolating Tenants: Quotas, Rate Limiting, Throttling</a></h4>
+
+  <p>
+  Multi-tenant clusters should generally be configured with <a href="#design_quotas">quotas</a>, which protect against users (tenants) eating up too many cluster resources, such as when they attempt to write or read very high volumes of data, or create requests to brokers at an excessively high rate. This may cause network saturation, monopolize broker resources, and impact other clients—all of which you want to avoid in a shared environment.
+  </p>
+
+  <p>
+    <strong>Client quotas:</strong> Kafka supports different types of (per-user principal) client quotas. Because a client's quotas apply irrespective of which topics the client is writing to or reading from, they are a convenient and effective tool to allocate resources in a multi-tenant cluster. <a href="#design_quotascpu">Request rate quotas</a>, for example, help to limit a user's impact on broker CPU usage by limiting the time a broker spends on the <a href="/protocol.html">request handling path</a> for that user, after which throttling kicks in. In many situations, isolating users with request rate quotas has a bigger impact in multi-tenant clusters than setting incoming/outgoing network bandwidth quotas, because excessive broker CPU usage for processing requests reduces the effective bandwidth the broker can serve. Furthermore, administrators can also define <a href="#brokerconfigs_controller.quota.window.num">quotas on topic operations</a> such as create, delete, and alter t
 o prevent Kafka clusters from being overwhelmed by highly concurrent topic operations (see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations">KIP-599</a>).
+  </p>
+
+  <p>
+    <strong>Server quotas:</strong> In addition to client-side quotas, Kafka supports different types of broker-side quotas. For example, administrators can set a limit on the rate with which the <a href="#brokerconfigs_max.connection.creation.rate">broker accepts new connections</a>, set the <a href="#brokerconfigs_max.connections">maximum number of connections per broker</a>, or set the maximum number of connections allowed <a href="#brokerconfigs_max.connections.per.ip">from a specific IP address</a>.

Review comment:
       Ack and updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] rajinisivaram commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

rajinisivaram commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585530580



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>

Review comment:
       Should we also talk about `policies` like CreateTopicPolicy/AlterConfigPolicy that also support additional restrictions?

##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.

Review comment:
       Yes, the text sounds like you need to disable auto topic creation to enforce topic ACLs which is not the case, we should rewrite that part.

##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>
+  </ol>
+
+  <p>
+  When securing a multi-tenant Kafka environment, the most common administrative task is the third category (authorization), i.e., managing the user/client permissions that grant or deny access to certain topics and thus to the data stored by users within a cluster. This task is performed predominantly through the <a href="#security_authz">setting of access control lists (ACLs)</a>. Here, administrators of multi-tenant environments in particular benefit from putting a hierarchical topic naming structure in place as described in a previous section, because they can conveniently control access to topics through prefixed ACLs (<code>--resource-pattern-type Prefixed</code>). This significantly minimizes the administrative overhead of securing topics in multi-tenant environments: administrators can make their own trade-offs between higher developer convenience (more lenient permissions, using fewer and broader ACLs) vs. tighter security (more stringent permissions, using more and narrowe
 r ACLs).
+  </p>
+
+  <p>
+    In the following example, user Alice—a new member of ACME corporation's InfoSec team—is granted write permissions to all topics whose names start with "acme.infosec.", such as "acme.infosec.telemetry.logins" and "acme.infosec.syslogs.events".
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Grant permissions to user Alice
+$ bin/kafka-acls.sh \
+    --bootstrap-server broker1:9092 \
+    --add --allow-principal User:Alice \
+    --producer \
+    --resource-pattern-type prefixed --topic acme.infosec.
+</code></pre>
+
+  <p>
+    You can similarly use this approach to isolate different customers on the same shared cluster.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-isolation" class="anchor-link"></a><a href="#multitenancy-isolation">Isolating Tenants: Quotas, Rate Limiting, Throttling</a></h4>
+
+  <p>
+  Multi-tenant clusters should generally be configured with <a href="#design_quotas">quotas</a>, which protect against users (tenants) eating up too many cluster resources, such as when they attempt to write or read very high volumes of data, or create requests to brokers at an excessively high rate. This may cause network saturation, monopolize broker resources, and impact other clients—all of which you want to avoid in a shared environment.
+  </p>
+
+  <p>
+    <strong>Client quotas:</strong> Kafka supports different types of (per-user principal) client quotas. Because a client's quotas apply irrespective of which topics the client is writing to or reading from, they are a convenient and effective tool to allocate resources in a multi-tenant cluster. <a href="#design_quotascpu">Request rate quotas</a>, for example, help to limit a user's impact on broker CPU usage by limiting the time a broker spends on the <a href="/protocol.html">request handling path</a> for that user, after which throttling kicks in. In many situations, isolating users with request rate quotas has a bigger impact in multi-tenant clusters than setting incoming/outgoing network bandwidth quotas, because excessive broker CPU usage for processing requests reduces the effective bandwidth the broker can serve. Furthermore, administrators can also define <a href="#brokerconfigs_controller.quota.window.num">quotas on topic operations</a> such as create, delete, and alter t
 o prevent Kafka clusters from being overwhelmed by highly concurrent topic operations (see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations">KIP-599</a>).
+  </p>
+
+  <p>
+    <strong>Server quotas:</strong> In addition to client-side quotas, Kafka supports different types of broker-side quotas. For example, administrators can set a limit on the rate with which the <a href="#brokerconfigs_max.connection.creation.rate">broker accepts new connections</a>, set the <a href="#brokerconfigs_max.connections">maximum number of connections per broker</a>, or set the maximum number of connections allowed <a href="#brokerconfigs_max.connections.per.ip">from a specific IP address</a>.

Review comment:
       Don't think we would refer to request rate or bandwidth quotas as `client-side`. They apply to clients, but are protecting broker resources and are enforced by brokers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585572446



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>
+  </ol>
+
+  <p>
+  When securing a multi-tenant Kafka environment, the most common administrative task is the third category (authorization), i.e., managing the user/client permissions that grant or deny access to certain topics and thus to the data stored by users within a cluster. This task is performed predominantly through the <a href="#security_authz">setting of access control lists (ACLs)</a>. Here, administrators of multi-tenant environments in particular benefit from putting a hierarchical topic naming structure in place as described in a previous section, because they can conveniently control access to topics through prefixed ACLs (<code>--resource-pattern-type Prefixed</code>). This significantly minimizes the administrative overhead of securing topics in multi-tenant environments: administrators can make their own trade-offs between higher developer convenience (more lenient permissions, using fewer and broader ACLs) vs. tighter security (more stringent permissions, using more and narrowe
 r ACLs).
+  </p>
+
+  <p>
+    In the following example, user Alice—a new member of ACME corporation's InfoSec team—is granted write permissions to all topics whose names start with "acme.infosec.", such as "acme.infosec.telemetry.logins" and "acme.infosec.syslogs.events".
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Grant permissions to user Alice
+$ bin/kafka-acls.sh \
+    --bootstrap-server broker1:9092 \
+    --add --allow-principal User:Alice \
+    --producer \
+    --resource-pattern-type prefixed --topic acme.infosec.
+</code></pre>
+
+  <p>
+    You can similarly use this approach to isolate different customers on the same shared cluster.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-isolation" class="anchor-link"></a><a href="#multitenancy-isolation">Isolating Tenants: Quotas, Rate Limiting, Throttling</a></h4>
+
+  <p>
+  Multi-tenant clusters should generally be configured with <a href="#design_quotas">quotas</a>, which protect against users (tenants) eating up too many cluster resources, such as when they attempt to write or read very high volumes of data, or create requests to brokers at an excessively high rate. This may cause network saturation, monopolize broker resources, and impact other clients—all of which you want to avoid in a shared environment.
+  </p>
+
+  <p>
+    <strong>Client quotas:</strong> Kafka supports different types of (per-user principal) client quotas. Because a client's quotas apply irrespective of which topics the client is writing to or reading from, they are a convenient and effective tool to allocate resources in a multi-tenant cluster. <a href="#design_quotascpu">Request rate quotas</a>, for example, help to limit a user's impact on broker CPU usage by limiting the time a broker spends on the <a href="/protocol.html">request handling path</a> for that user, after which throttling kicks in. In many situations, isolating users with request rate quotas has a bigger impact in multi-tenant clusters than setting incoming/outgoing network bandwidth quotas, because excessive broker CPU usage for processing requests reduces the effective bandwidth the broker can serve. Furthermore, administrators can also define <a href="#brokerconfigs_controller.quota.window.num">quotas on topic operations</a> such as create, delete, and alter t
 o prevent Kafka clusters from being overwhelmed by highly concurrent topic operations (see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations">KIP-599</a>).
+  </p>
+
+  <p>
+    <strong>Server quotas:</strong> In addition to client-side quotas, Kafka supports different types of broker-side quotas. For example, administrators can set a limit on the rate with which the <a href="#brokerconfigs_max.connection.creation.rate">broker accepts new connections</a>, set the <a href="#brokerconfigs_max.connections">maximum number of connections per broker</a>, or set the maximum number of connections allowed <a href="#brokerconfigs_max.connections.per.ip">from a specific IP address</a>.

Review comment:
       Ack




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#discussion_r585575690



##########
File path: 27/ops.html
##########
@@ -1090,7 +1090,157 @@ <h4 class="anchor-heading"><a id="georeplication-monitoring" class="anchor-link"
   </p>
 
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.4 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="multitenancy" class="anchor-link"></a><a href="#multitenancy">6.4 Multi-Tenancy</a></h3>
+
+  <h4 class="anchor-heading"><a id="multitenancy-overview" class="anchor-link"></a><a href="#multitenancy-overview">Multi-Tenancy Overview</a></h4>
+
+  <p>
+    As a highly scalable event streaming platform, Kafka is used by many users as their central nervous system, connecting in real-time a wide range of different systems and applications from various teams and lines of businesses. Such multi-tenant cluster environments command proper control and management to ensure the peaceful coexistence of these different needs. This section highlights features and best practices to set up such shared environments, which should help you operate clusters that meet SLAs/OLAs and that minimize potential collateral damage caused by "noisy neighbors".
+  </p>
+
+  <p>
+    Multi-tenancy is a many-sided subject, including but not limited to:
+  </p>
+
+  <ul>
+    <li>Creating user spaces for tenants (sometimes called namespaces)</li>
+    <li>Configuring topics with data retention policies and more</li>
+    <li>Securing topics and clusters with encryption, authentication, and authorization</li>
+    <li>Isolating tenants with quotas and rate limits</li>
+    <li>Monitoring and metering</li>
+    <li>Inter-cluster data sharing (cf. geo-replication)</li>
+  </ul>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-naming" class="anchor-link"></a><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces) For Tenants With Topic Naming</a></h4>
+
+  <p>
+    Kafka administrators operating a multi-tenant cluster typically need to define user spaces for each tenant. For the purpose of this section, "user spaces" are a collection of topics, which are grouped together under the management of a single entity or user.
+  </p>
+
+  <p>
+    In Kafka, the main unit of data is the topic. Users can create and name each topic. They can also delete them, but it is not possible to rename a topic directly. Instead, to rename a topic, the user must create a new topic, move the messages from the original topic to the new, and then delete the original. With this in mind, it is recommended to define logical spaces, based on an hierarchical topic naming structure. This setup can then be combined with security features, such as prefixed ACLs, to isolate different spaces and tenants, while also minimizing the administrative overhead for securing the data in the cluster.
+  </p>
+
+  <p>
+    These logical user spaces can be grouped in different ways, and the concrete choice depends on how your organization prefers to use your Kafka clusters. The most common groupings are as follows.
+  </p>
+
+  <p>
+    <em>By team or organizational unit:</em> Here, the team is the main aggregator. In an organization where teams are the main user of the Kafka infrastructure, this might be the best grouping.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;organization&gt;.&lt;team&gt;.&lt;dataset&gt;.&lt;event-name&gt;</code><br />(e.g., "acme.infosec.telemetry.logins")</li>
+  </ul>
+
+  <p>
+    <em>By project or product:</em> Here, a team manages more than one project. Their credentials will be different for each project, so all the controls and settings will always be project related.
+  </p>
+
+  <p>
+    Example topic naming structure:
+  </p>
+
+  <ul>
+    <li><code>&lt;project&gt;.&lt;product&gt;.&lt;event-name&gt;</code><br />(e.g., "mobility.payments.suspicious")</li>
+  </ul>
+
+  <p>
+    Certain information should normally not be put in a topic name, such as information that is likely to change over time (e.g., the name of the intended consumer) or that is a technical detail or metadata that is available elsewhere (e.g., the topic's partition count and other configuration settings).
+  </p>
+
+  <p>
+  To enforce a topic naming structure, it is useful to disable the Kafka feature to auto-create topics on demand by setting <code>auto.create.topics.enable=false</code> in the broker configuration. This stops users and applications from deliberately or inadvertently creating topics with arbitrary names, thus violating the naming structure. Then, you may want to put in place your own organizational process for controlled, yet automated creation of topics according to your naming convention, using scripting or your favorite automation toolkit.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-topic-configs" class="anchor-link"></a><a href="#multitenancy-topic-configs">Configuring Topics: Data Retention And More</a></h4>
+
+  <p>
+    Kafka's configuration is very flexible due to its fine granularity, and it supports a plethora of <a href="#topicconfigs">per-topic configuration settings</a> to help administrators set up multi-tenant clusters. For example, administrators often need to define data retention policies to control how much and/or for how long data will be stored in a topic, with settings such as <a href="#retention.bytes">retention.bytes</a> (size) and <a href="#retention.ms">retention.ms</a> (time). This limits storage consumption within the cluster, and helps complying with legal requirements such as GDPR.
+  </p>
+
+  <h4 class="anchor-heading"><a id="multitenancy-security" class="anchor-link"></a><a href="#multitenancy-security">Securing Clusters and Topics: Authentication, Authorization, Encryption</a></h4>
+
+  <p>
+  Because the documentation has a dedicated chapter on <a href="#security">security</a> that applies to any Kafka deployment, this section focuses on additional considerations for multi-tenant environments.
+  </p>
+
+  <p>
+Security settings for Kafka fall into three main categories, which are similar to how administrators would secure other client-server data systems, like relational databases and traditional messaging systems.
+  </p>
+
+  <ol>
+    <li><strong>Encryption</strong> of data transferred between Kafka brokers and Kafka clients, between brokers, between brokers and ZooKeeper nodes, and between brokers and other, optional tools.</li>
+    <li><strong>Authentication</strong> of connections from Kafka clients and applications to Kafka brokers, as well as connections from Kafka brokers to ZooKeeper nodes.</li>
+    <li><strong>Authorization</strong> of client operations such as creating, deleting, and altering the configuration of topics; writing events to or reading events from a topic; creating and deleting ACLs.</li>

Review comment:
       Yeah, I can add a note. But it's unfortunate that there's essentially zero coverage in the AK docs on how to use these.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#issuecomment-790412618


   Thanks all for reviewing, much appreciated!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#issuecomment-789009875


   PR updated with reviewer feedback.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

miguno commented on pull request #334:
URL: https://github.com/apache/kafka-site/pull/334#issuecomment-788696584


   @bbejeck wrote in https://github.com/apache/kafka-site/pull/334#pullrequestreview-600990037:
   > Also, @miguno, can you create an identical PR to go against docs in AK trunk?
   
   Yes, I will do this once the content review of this PR is completed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] bbejeck merged pull request #334: KAFKA-12393: Document multi-tenancy considerations

Posted by GitBox <gi...@apache.org>.

bbejeck merged pull request #334:
URL: https://github.com/apache/kafka-site/pull/334


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org