You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/15 07:17:08 UTC

[GitHub] [pulsar] momo-jun opened a new pull request, #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

momo-jun opened a new pull request, #16069:
URL: https://github.com/apache/pulsar/pull/16069

   ### Modifications
   
   1. Fix #15629 - Add docs for anti-affinity namespace and failure domain.
   2. Re-org the Load Balance topic with a few more updates:
       a) rename the topic title, remove the original heading2 titles, and elevate all the subheadings.
       b) move the "unload" section behind the "auto shed load" section.
   4. Fix #15904 - Clarify the scope of `loadBalancerBrokerOverloadedThresholdPercentage`.
   
   Will attach the screenshots of my local build after a review.
   
   
   ### Documentation
   
   - [ ] `doc` 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901259782


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.

Review Comment:
   ```suggestion
   Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these challenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
   ```



##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -124,91 +95,184 @@ loadBalancerNamespaceMaximumBundles=128
 
 ```
 
-### Shed load automatically
+## Shed load automatically
 
-The support for automatic load shedding is available in the load manager of Pulsar. This means that whenever the system recognizes a particular broker is overloaded, the system forces some traffic to be reassigned to less loaded brokers.
+The support for automatic load shedding is available in the load manager of Pulsar. This means that whenever the system recognizes a particular broker is overloaded, the system forces some traffic to be reassigned to less-loaded brokers.
 
 When a broker is identified as overloaded, the broker forces to "unload" a subset of the bundles, the ones with higher traffic, that make up for the overload percentage.
 
-For example, the default threshold is 85% and if a broker is over quota at 95% CPU usage, then the broker unloads the percent difference plus a 5% margin: `(95% - 85%) + 5% = 15%`.
-
-Given the selection of bundles to offload is based on traffic (as a proxy measure for cpu, network and memory), broker unloads bundles for at least 15% of traffic.
+For example, the default threshold is 85% and if a broker is over quota at 95% CPU usage, then the broker unloads the percent difference plus a 5% margin: `(95% - 85%) + 5% = 15%`. Given the selection of bundles to unload is based on traffic (as a proxy measure for CPU, network, and memory), the broker unloads bundles for at least 15% of traffic.
 
-The automatic load shedding is enabled by default and you can disable the automatic load shedding with this setting:
+:::tip
 
-```properties
+* The automatic load shedding is enabled by default. To disable it, you can set `loadBalancerSheddingEnabled` to `false`.
+* Besides the automatic load shedding, you can [manually unload bundles](#unload-topics-and-bundles).
 
-# Enable/disable automatic bundle unloading for load-shedding
-loadBalancerSheddingEnabled=true
-
-```
+:::
 
 Additional settings that apply to shedding:
 
-```properties
+```conf
 
 # Load shedding interval. Broker periodically checks whether some traffic should be offload from
 # some over-loaded broker to other under-loaded brokers
 loadBalancerSheddingIntervalMinutes=1
 
-# Prevent the same topics to be shed and moved to other brokers more that once within this timeframe
+# Prevent the same topics to be shed and moved to other brokers more than once within this timeframe
 loadBalancerSheddingGracePeriodMinutes=30
 
 ```
 
-Pulsar supports the following types of shedding strategies. From Pulsar 2.10, the **default** shedding strategy is `ThresholdShedder`.
+Pulsar supports the following types of automatic load shedding strategies. 
+* [ThresholdShedder](#thresholdshedder)
+* [OverloadShedder](#overloadshedder)
+* [UniformLoadShedder](#uniformloadshedder)
 
-##### ThresholdShedder
-This strategy tends to shed the bundles if any broker's usage is above the configured threshold. It does this by first computing the average resource usage per broker for the whole cluster. The resource usage for each broker is calculated using the following method: LocalBrokerData#getMaxResourceUsageWithWeight. The weights for each resource are configurable. Historical observations are included in the running average based on the broker's setting for loadBalancerHistoryResourcePercentage. Once the average resource usage is calculated, a broker's current/historical usage is compared to the average broker usage. If a broker's usage is greater than the average usage per broker plus the loadBalancerBrokerThresholdShedderPercentage, this load shedder proposes removing enough bundles to bring the unloaded broker 5% below the current average broker usage. Note that recently unloaded bundles are not unloaded again. Configure broker with below value to use this strategy.
-`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
+:::note
+
+* From Pulsar 2.10, the **default** shedding strategy is `ThresholdShedder`.
+* You need to restart brokers if the shedding strategy is [dynamically updated](admin-api-brokers.md/#dynamic-broker-configuration). 
+
+:::
+
+### ThresholdShedder
+This strategy tends to shed the bundles if any broker's usage is above the configured threshold. It does this by first computing the average resource usage per broker for the whole cluster. The resource usage for each broker is calculated using the following method `LocalBrokerData#getMaxResourceUsageWithWeight`. Historical observations are included in the running average based on the broker's setting for `loadBalancerHistoryResourcePercentage`. Once the average resource usage is calculated, a broker's current/historical usage is compared to the average broker usage. If a broker's usage is greater than the average usage per broker plus the `loadBalancerBrokerThresholdShedderPercentage`, this load shedder proposes removing enough bundles to bring the unloaded broker 5% below the current average broker usage. Note that recently unloaded bundles are not unloaded again. 
 
 ![Shedding strategy - ThresholdShedder](/assets/ThresholdShedder.png)
 
-##### OverloadShedder
-This strategy will attempt to shed exactly one bundle on brokers which are overloaded, that is, whose maximum system resource usage exceeds loadBalancerBrokerOverloadedThresholdPercentage. To see which resources are considered when determining the maximum system resource. A bundle is recommended for unloading off that broker if and only if the following conditions hold: The broker has at least two bundles assigned and the broker has at least one bundle that has not been unloaded recently according to LoadBalancerSheddingGracePeriodMinutes. The unloaded bundle will be the most expensive bundle in terms of message rate that has not been recently unloaded. Note that this strategy does not take into account "underloaded" brokers when determining which bundles to unload. If you are looking for a strategy that spreads load evenly across all brokers, see ThresholdShedder. Configure broker with below value to use this strategy.
-`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder`
+To use the `ThresholdShedder` strategy, configure brokers with this value.
+`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
+
+You can configure the weights for each resource per broker in the `conf/broker.conf` file. 

Review Comment:
   same comment as above



##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -124,91 +95,184 @@ loadBalancerNamespaceMaximumBundles=128
 
 ```
 
-### Shed load automatically
+## Shed load automatically
 
-The support for automatic load shedding is available in the load manager of Pulsar. This means that whenever the system recognizes a particular broker is overloaded, the system forces some traffic to be reassigned to less loaded brokers.
+The support for automatic load shedding is available in the load manager of Pulsar. This means that whenever the system recognizes a particular broker is overloaded, the system forces some traffic to be reassigned to less-loaded brokers.
 
 When a broker is identified as overloaded, the broker forces to "unload" a subset of the bundles, the ones with higher traffic, that make up for the overload percentage.
 
-For example, the default threshold is 85% and if a broker is over quota at 95% CPU usage, then the broker unloads the percent difference plus a 5% margin: `(95% - 85%) + 5% = 15%`.
-
-Given the selection of bundles to offload is based on traffic (as a proxy measure for cpu, network and memory), broker unloads bundles for at least 15% of traffic.
+For example, the default threshold is 85% and if a broker is over quota at 95% CPU usage, then the broker unloads the percent difference plus a 5% margin: `(95% - 85%) + 5% = 15%`. Given the selection of bundles to unload is based on traffic (as a proxy measure for CPU, network, and memory), the broker unloads bundles for at least 15% of traffic.
 
-The automatic load shedding is enabled by default and you can disable the automatic load shedding with this setting:
+:::tip
 
-```properties
+* The automatic load shedding is enabled by default. To disable it, you can set `loadBalancerSheddingEnabled` to `false`.
+* Besides the automatic load shedding, you can [manually unload bundles](#unload-topics-and-bundles).
 
-# Enable/disable automatic bundle unloading for load-shedding
-loadBalancerSheddingEnabled=true
-
-```
+:::
 
 Additional settings that apply to shedding:
 
-```properties
+```conf
 
 # Load shedding interval. Broker periodically checks whether some traffic should be offload from
 # some over-loaded broker to other under-loaded brokers
 loadBalancerSheddingIntervalMinutes=1
 
-# Prevent the same topics to be shed and moved to other brokers more that once within this timeframe
+# Prevent the same topics to be shed and moved to other brokers more than once within this timeframe
 loadBalancerSheddingGracePeriodMinutes=30
 
 ```
 
-Pulsar supports the following types of shedding strategies. From Pulsar 2.10, the **default** shedding strategy is `ThresholdShedder`.
+Pulsar supports the following types of automatic load shedding strategies. 
+* [ThresholdShedder](#thresholdshedder)
+* [OverloadShedder](#overloadshedder)
+* [UniformLoadShedder](#uniformloadshedder)
 
-##### ThresholdShedder
-This strategy tends to shed the bundles if any broker's usage is above the configured threshold. It does this by first computing the average resource usage per broker for the whole cluster. The resource usage for each broker is calculated using the following method: LocalBrokerData#getMaxResourceUsageWithWeight. The weights for each resource are configurable. Historical observations are included in the running average based on the broker's setting for loadBalancerHistoryResourcePercentage. Once the average resource usage is calculated, a broker's current/historical usage is compared to the average broker usage. If a broker's usage is greater than the average usage per broker plus the loadBalancerBrokerThresholdShedderPercentage, this load shedder proposes removing enough bundles to bring the unloaded broker 5% below the current average broker usage. Note that recently unloaded bundles are not unloaded again. Configure broker with below value to use this strategy.
-`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
+:::note
+
+* From Pulsar 2.10, the **default** shedding strategy is `ThresholdShedder`.
+* You need to restart brokers if the shedding strategy is [dynamically updated](admin-api-brokers.md/#dynamic-broker-configuration). 
+
+:::
+
+### ThresholdShedder
+This strategy tends to shed the bundles if any broker's usage is above the configured threshold. It does this by first computing the average resource usage per broker for the whole cluster. The resource usage for each broker is calculated using the following method `LocalBrokerData#getMaxResourceUsageWithWeight`. Historical observations are included in the running average based on the broker's setting for `loadBalancerHistoryResourcePercentage`. Once the average resource usage is calculated, a broker's current/historical usage is compared to the average broker usage. If a broker's usage is greater than the average usage per broker plus the `loadBalancerBrokerThresholdShedderPercentage`, this load shedder proposes removing enough bundles to bring the unloaded broker 5% below the current average broker usage. Note that recently unloaded bundles are not unloaded again. 
 
 ![Shedding strategy - ThresholdShedder](/assets/ThresholdShedder.png)
 
-##### OverloadShedder
-This strategy will attempt to shed exactly one bundle on brokers which are overloaded, that is, whose maximum system resource usage exceeds loadBalancerBrokerOverloadedThresholdPercentage. To see which resources are considered when determining the maximum system resource. A bundle is recommended for unloading off that broker if and only if the following conditions hold: The broker has at least two bundles assigned and the broker has at least one bundle that has not been unloaded recently according to LoadBalancerSheddingGracePeriodMinutes. The unloaded bundle will be the most expensive bundle in terms of message rate that has not been recently unloaded. Note that this strategy does not take into account "underloaded" brokers when determining which bundles to unload. If you are looking for a strategy that spreads load evenly across all brokers, see ThresholdShedder. Configure broker with below value to use this strategy.
-`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder`
+To use the `ThresholdShedder` strategy, configure brokers with this value.
+`loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
+
+You can configure the weights for each resource per broker in the `conf/broker.conf` file. 
+
+```conf
+
+# The BandWithIn usage weight when calculating new resource usage.
+loadBalancerBandwithInResourceWeight=1.0
+
+# The BandWithOut usage weight when calculating new resource usage.
+loadBalancerBandwithOutResourceWeight=1.0
+
+# The CPU usage weight when calculating new resource usage.
+loadBalancerCPUResourceWeight=1.0
+
+# The heap memory usage weight when calculating new resource usage.
+loadBalancerMemoryResourceWeight=1.0
+
+# The direct memory usage weight when calculating new resource usage.
+loadBalancerDirectMemoryResourceWeight=1.0
+
+```
+
+### OverloadShedder
+This strategy will attempt to shed exactly one bundle on brokers which are overloaded, that is, whose maximum system resource usage exceeds [`loadBalancerBrokerOverloadedThresholdPercentage`](#broker-overload-thresholds). To see which resources are considered when determining the maximum system resource. A bundle is recommended for unloading off that broker if and only if the following conditions hold: The broker has at least two bundles assigned and the broker has at least one bundle that has not been unloaded recently according to LoadBalancerSheddingGracePeriodMinutes. The unloaded bundle will be the most expensive bundle in terms of message rate that has not been recently unloaded. Note that this strategy does not take into account "underloaded" brokers when determining which bundles to unload. If you are looking for a strategy that spreads load evenly across all brokers, see ThresholdShedder. 

Review Comment:
   ```suggestion
   This strategy attempts to shed exactly one bundle on brokers which are overloaded, that is, whose maximum system resource usage exceeds [`loadBalancerBrokerOverloadedThresholdPercentage`](#broker-overload-thresholds). To see which resources are considered when determining the maximum system resource. A bundle is recommended for unloading off that broker if and only if the following conditions hold: The broker has at least two bundles assigned and the broker has at least one bundle that has not been unloaded recently according to LoadBalancerSheddingGracePeriodMinutes. The unloaded bundle will be the most expensive bundle in terms of message rate that has not been recently unloaded. Note that this strategy does not take into account "underloaded" brokers when determining which bundles to unload. If you are looking for a strategy that spreads load evenly across all brokers, see ThresholdShedder. 
   ```



##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   can we configure them in `standalone.conf`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Demogorgon314 commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
Demogorgon314 commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901510399


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   @momo-jun Yes, you are right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901273299


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.

Review Comment:
   Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet merged pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
Anonymitaet merged PR #16069:
URL: https://github.com/apache/pulsar/pull/16069


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901272198


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   I think the answer is "no", because load balance is only required when you have multiple brokers. @Demogorgon314 is it true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
momo-jun commented on PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#issuecomment-1159929676

   Ping @Anonymitaet for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] liangyuanpeng commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

Posted by GitBox <gi...@apache.org>.
liangyuanpeng commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901461853


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   >  load balance is only required when you have multiple brokers
   Yes,  The config of load balance is unnecessary on standalone.



##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   >  load balance is only required when you have multiple brokers  
   
   Yes,  The config of load balance is unnecessary on standalone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org