You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/08/17 01:36:55 UTC

[GitHub] [pulsar] momo-jun commented on a diff in pull request #16843: [improve][doc] Add more concepts/tasks for bookie isolation

momo-jun commented on code in PR #16843:
URL: https://github.com/apache/pulsar/pull/16843#discussion_r947374994


##########
site2/docs/administration-isolation-bookie.md:
##########
@@ -10,24 +10,230 @@ import TabItem from '@theme/TabItem';
 ````
 
 
-A namespace can be isolated into user-defined groups of bookies, which guarantees all the data that belongs to the namespace is stored in desired bookies. The bookie affinity group uses the BookKeeper [rack-aware placement policy](https://bookkeeper.apache.org/docs/latest/api/javadoc/org/apache/bookkeeper/client/EnsemblePlacementPolicy.html) and it is a way to feed rack information which is stored as JSON format in znode.
+Isolating bookies equals isolating message storage, which is a data storage mechanism that provides isolation and safety for specific topics. 
 
-You can set a bookie affinity group using one of the following methods.
+Bookie isolation is controlled by BookKeeper clients. For Pulsar, there are two kinds of BookKeeper clients to read and write data. 
+*  BookKeeper clients on the broker side
+  Pulsar brokers use these BookKeeper clients to read and write topic messages. 
+*  BookKeeper clients on the bookie auto-recovery side
+   * The bookie auditor checks whether ledger replicas fulfill the configured isolation policy;
+   * The bookie replication worker writes ledger replicas to target bookies according to the configured isolation policy.
+
+To isolate bookies, you need to complete the following tasks.
+1. Select a [data isolation policy](#understand-bookie-data-isolation-policies) based on your requirements.
+2. [Enable the policy on BookKeeper clients](#enable-bookie-data-placement-policy).
+3. [Configure the policy on bookie instances](#configure-data-placement-policy-on-bookie-instances).
+
+
+## Understand bookie data isolation policy
+
+Bookie data isolation policy is built on top of the existing BookKeeper rack-aware placement policy. The “rack” concept can be anything, for example, racks, regions, availability zones. It writes the configured isolation policy into the metadata store. Both BookKeeper clients on the broker and bookie auto-recovery side read the configured isolation policy from the metadata store and apply it when choosing bookies to store messages.
+
+BookKeeper provides three kinds of data isolation policy for disaster tolerance.
+* Rack-aware placement policy (default)
+* Region-aware placement policy
+* Zone-aware placement policy
+
+:::tip
+
+* Both [rack-aware placement policy](#rack-aware-placement-policy) and [region-aware placement policy](#region-aware-placement-policy) can be used in all kinds of deployments where racks are a subset of a region. The major difference between the two policies is:
+  * With `RackawareEnsemblePlacementPolicy` configured, the BookKeeper client chooses bookies from different **racks** to reduce the single-point-of-failure. If there is only one rack available, the policy falls back on choosing a random bookie across available ones.
+  * With `RegionAwareEnsemblePlacementPolicy` configured, the BookKeeper client chooses bookies from different **regions**; for the selected region, it chooses bookies from different racks if more than one ensemble falls into the same region.
+
+* Zone-aware placement policy (`ZoneAwareEnsemblePlacementPolicy`) can be used in a public cloud infrastructure where Availability Zones (AZs) are isolated locations within the data center regions that public cloud services originate from and operate in.
+
+:::
+
+### Rack-aware placement policy
+
+Rack-aware placement policy enforces different data replicas to be placed in different racks to guarantee the rack-level disaster tolerance for your production environment. A data center usually has a lot of racks, and each rack has many storage nodes. You can use `RackAwareEnsemblePlacementPolicy` to configure the rack information for each bookie.
+
+#### Qualified rack size of bookies
+
+When the available rack size of bookies can meet the requirements configured on a topic, the rack-aware placement policy can work well and you don’t need any extra configurations.
+
+For example, the BookKeeper cluster has 4 racks and 13 bookie instances as shown the following diagram. When a topic is configured with `EnsembleSize=3, WriteQuorum=3, AckQuorum=2`, the BookKeeper client chooses one bookie instance from three different racks to write data to, such as Bookie2, Bookie8, and Bookie12.
+
+
+![Rack-aware placement policy](/assets/rack-aware-placement-policy-1.svg)
+
+#### Enforced minimum rack size of bookies
+
+When the available rack size of bookies cannot meet the requirements configured on a topic, the strategy that the BookKeeper client chooses bookies to recover old ledgers and create new ledgers depends on whether the enforced minimum rack size of bookies is configured. 
+
+In this case, if you want to make the rack-aware placement policy work as usual, you need to configure an enforced minimum rack size of bookies (`MinNumRacksPerWriteQuorum`).
+
+For example, you have the same BookKeeper cluster with the same topic requirements `EnsembleSize=3, WriteQuorum=3, AckQuorum=2` as shown in the above diagram. When all the bookie instances in Rack3 and Rack4 failed, you only have 2 available racks and there are the following three possibilities.
+
+* If you have configured `EnforceMinNumRacksPerWriteQuorum=true` and `MinNumRacksPerWriteQuorum=3`, the BookKeeper client fails to choose bookies, which means new ledgers cannot be created and old ledgers cannot be recovered. Because the requirement of `MinNumRacksPerWriteQuorum=3` cannot be fulfilled.
+
+* If you have configured `EnforceMinNumRacksPerWriteQuorum=true` and `MinNumRacksPerWriteQuorum=2`, the BookKeeper client chooses one bookie from Rack1 and Rack2 to recover old ledgers, such as bookie1 and bookie5, to place 2 replicas for Bookie8 and Bookie12. For new ledger creation, it chooses one bookie from Rack1 and Rack2, such as Bookie4 and Bookie7, and a random bookie from either Rack1 or Rack2 to place the last replica.
+
+![Rack-aware placement policy with an enforced minimum rack size of bookies](/assets/rack-aware-placement-policy-2.svg)
+
+* If you have configured `EnforceMinNumRacksPerWriteQuorum=false`, the BookKeeper client tries its best-effort to apply the placement policy depending on the available number of racks and bookies. It may still work as the above diagram or the following diagram. 
+
+![Rack-aware placement policy without an enforced minimum rack size of bookies](/assets/rack-aware-placement-policy-3.svg)
+
+### Region-aware placement policy
+
+Region-aware placement policy enforces different data replicas to be placed in different regions and racks to guarantee the region-level disaster tolerance. To achieve datacenter level disaster tolerance, you need to write data replicas into different data centers. You can use `RegionAwareEnsemblePlacementPolicy` to configure region and rack information for each bookie node to ensure region-level disaster tolerance.
+
+For example, the BookKeeper cluster has 4 regions, and each region has several racks with their bookie instances, as shown the following diagram. If a topic is configured with `EnsembleSize=3, WriteQuorum=3, and AckQuorum=2`, the BookKeeper client chooses three different regions, such as Region A, Region C and Region D. For each region, it chooses one bookie on a single rack, such as Bookie5 on Rack2, Bookie17 on Rack6, and Bookie21 on Rack8. 

Review Comment:
   Refers to "racks", so I used "their" here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org