You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/20 05:43:01 UTC

[GitHub] [pulsar] momo-jun commented on a diff in pull request #16069: [feature][doc] Add docs for failure domain + anti-affinity namespace

momo-jun commented on code in PR #16069:
URL: https://github.com/apache/pulsar/pull/16069#discussion_r901272198


##########
site2/website/versioned_docs/version-2.10.0/administration-load-balance.md:
##########
@@ -1,111 +1,82 @@
 ---
 id: administration-load-balance
-title: Pulsar load balance
+title: Load balance across brokers
 sidebar_label: "Load balance"
 original_id: administration-load-balance
 ---
 
-## Load balance across Pulsar brokers
 
-Pulsar is an horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
+Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, which is a core requirement.
 
-You can use multiple settings and tools to control the traffic distribution which require a bit of context to understand how the traffic is managed in Pulsar. Though, in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
+You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. Though in most cases, the core requirement mentioned above is true out of the box and you should not worry about it. 
 
-## Pulsar load manager architecture
+The following sections introduce how the load-balanced assignments work across Pulsar brokers and how you can leverage the framework to adjust.
 
-The following part introduces the basic architecture of the Pulsar load manager.
+## Dynamic assignments
 
-### Assign topics to brokers dynamically
+Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster. The assignment of topics to brokers is not done at the topic level but at the **bundle** level (a higher level). Instead of individual topic assignments, each broker takes ownership of a subset of the topics for a namespace. This subset is called a bundle and effectively this subset is a sharding mechanism. 
 
-Topics are dynamically assigned to brokers based on the load conditions of all brokers in the cluster.
+In other words, each namespace is an "administrative" unit and sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls. Each bundle is independent of the others and thus is independently assigned to different brokers.
 
-When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best suited broker to acquire ownership of these topics according to the load conditions. 
+The benefit of the assignment granularity is to amortize the amount of information that you need to keep track of. Based on CPU, memory, traffic load, and other indexes, topics are assigned to a particular broker dynamically. For example: 
+* When a client starts using new topics that are not assigned to any broker, a process is triggered to choose the best-suited broker to acquire ownership of these topics according to the load conditions. 
+* If the broker owning a topic becomes overloaded, the topic is reassigned to a less-loaded broker.
+* If the broker owning a topic crashes, the topic is reassigned to another active broker.
 
-In case of partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
+:::tip
 
-The assignment is "dynamic" because the assignment changes quickly. For example, if the broker owning the topic crashes, the topic is reassigned immediately to another broker. Another scenario is that the broker owning the topic becomes overloaded. In this case, the topic is reassigned to a less loaded broker.
+For partitioned topics, different partitions are assigned to different brokers. Here "topic" means either a non-partitioned topic or one partition of a topic.
 
-The stateless nature of brokers makes the dynamic assignment possible, so you can quickly expand or shrink the cluster based on usage.
+:::
 
-#### Assignment granularity
+## Create namespaces with assigned bundles
 
-The assignment of topics or partitions to brokers is not done at the topics or partitions level, but done at the Bundle level (a higher level). The reason is to amortize the amount of information that you need to keep track. Based on CPU, memory, traffic load and other indexes, topics are assigned to a particular broker dynamically. 
+When you create a new namespace, a number of bundles are assigned to the namespace. You can set this number in the `conf/broker.conf` file:
 
-Instead of individual topic or partition assignment, each broker takes ownership of a subset of the topics for a namespace. This subset is called a "*bundle*" and effectively this subset is a sharding mechanism.
+```conf
 
-The namespace is the "administrative" unit: many config knobs or operations are done at the namespace level.
-
-For assignment, a namespaces is sharded into a list of "bundles", with each bundle comprising a portion of overall hash range of the namespace.
-
-Topics are assigned to a particular bundle by taking the hash of the topic name and checking in which bundle the hash falls into.
-
-Each bundle is independent of the others and thus is independently assigned to different brokers.
-
-### Create namespaces and bundles
-
-When you create a new namespace, the new namespace sets to use the default number of bundles. You can set this in `conf/broker.conf`:
-
-```properties
-
-# When a namespace is created without specifying the number of bundle, this
+# When a namespace is created without specifying the number of bundles, this
 # value will be used as the default
 defaultNumberOfNamespaceBundles=4
 
 ```
 
-You can either change the system default, or override it when you create a new namespace:
+Alternatively, you can override the value when you create a new namespace using [Pulsar admin](/tools/pulsar-admin/):
 
 ```shell
 
-$ bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
+bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters us-west --bundles 16
 
 ```
 
-With this command, you create a namespace with 16 initial bundles. Therefore the topics for this namespaces can immediately be spread across up to 16 brokers.
+With the above command, you create a namespace with 16 initial bundles. Therefore the topics for this namespace can immediately be spread across up to 16 brokers.
 
 In general, if you know the expected traffic and number of topics in advance, you had better start with a reasonable number of bundles instead of waiting for the system to auto-correct the distribution.
 
-On the same note, it is beneficial to start with more bundles than the number of brokers, because of the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
-
-### Unload topics and bundles
+On the same note, it is beneficial to start with more bundles than the number of brokers, due to the hashing nature of the distribution of topics into bundles. For example, for a namespace with 1000 topics, using something like 64 bundles achieves a good distribution of traffic across 16 brokers.
 
-You can "unload" a topic in Pulsar with admin operation. Unloading means to close the topics, release ownership and reassign the topics to a new broker, based on current load.
 
-When unloading happens, the client experiences a small latency blip, typically in the order of tens of milliseconds, while the topic is reassigned.
+## Split namespace bundles
 
-Unloading is the mechanism that the load-manager uses to perform the load shedding, but you can also trigger the unloading manually, for example to correct the assignments and redistribute traffic even before having any broker overloaded.
+Since the load for the topics in a bundle might change over time and predicting the load might be hard, bundle split is designed to resolve these chanllenges. The broker splits a bundle into two and the new smaller bundles can be reassigned to different brokers.
 
-Unloading a topic has no effect on the assignment, but just closes and reopens the particular topic:
+Pulsar supports the following two bundle split algorithms:
+* `range_equally_divide`: split the bundle into two parts with the same hash range size.
+* `topic_count_equally_divide`: split the bundle into two parts with the same number of topics.
 
-```shell
+To enable bundle split, you need to configure the following settings in the `broker.conf` file, and set `defaultNamespaceBundleSplitAlgorithm` based on your needs.

Review Comment:
   I think the answer is "no", because load balance is only required when you have multiple brokers. @Demogorgon314 is it true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org