You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/12/11 15:12:06 UTC

[GitHub] [pulsar] asafm opened a new pull request, #18878: [improve][doc] Added docs for key shared subscription hashing schemes

asafm opened a new pull request, #18878:
URL: https://github.com/apache/pulsar/pull/18878

   ### Motivation
   
   Key Shared subscription type has multiple algorithms for mapping a message key to a consumer, yet they are not documented, leaving the user with no easy way of knowing how a Key Shared Subscription works and what are the possible options for configuration.
   
   ### Modifications
   
   Added documentation under Messaging / Concepts / Subscriptions / Key Shared
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [x] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/asafm/pulsar/pull/1
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1350512540

   > What is left is that I need to know how to make the diagrams consistent with the rest of the documentation
   
   Normally, we use LucidChart to draw illustrations, e.g., this is what I'm working on: https://lucid.app/lucidchart/323370a9-0466-49e1-836a-968885aaf23a/edit?invitationId=inv_55ab3a55-7f98-4943-baa4-a8be8ddf7fac&page=0_0#


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047157234


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 

Review Comment:
   <img width="1056" alt="image" src="https://user-images.githubusercontent.com/989425/207333821-2b8f9bc2-1fd6-40b7-a428-7372d0c2720c.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047160951


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the existing connected consumers.
+
+```
+                      +--------------+                              +-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm / ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+
+```
+ 0               16,384            32,768           49,152             65,536
+ |------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+Given a message key `Order-3459134`, its hash would be `murmur32("Order-3459134") = 3112179635`, and its index in the range would be `3112179635 mod 65536 = 6067`. That index is contained within region `[0, 16384)` thus consumer C1 will be mapped to this message key.
+
+When a new consumer is connected, the largest region is chosen and is then split in half - the lower half will be mapped to the newly added consumer and upper half will be mapped to the consumer owning that region. Here is how it looks like from 1 to 4 consumers:
+
+```
+C1 connected:
+|---------------------------------- C1 ---------------------------------|
+
+C2 connected:
+|--------------- C2 ----------------|---------------- C1 ---------------|
+
+C3 connected:
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+
+C4 connected:
+|------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+When a consumer is disconnected its region will be merged into the region on its right. Examples:
+
+C4 is disconnected:
+
+```
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+```
+
+C1 is disconnected:
+
+```
+|------- C3 ------|-------------------------- C2 -----------------------|
+```
+
+The advantages of this algorithm is that it affects only a single existing consumer upon add/delete consumer, at the expense of regions not evenly sized. Thi means some consumers gets more keys that others. The next algorithm does the other way around. 
+
+##### Auto-split Consistent Hashing
+
+This algorithm uses a Hash Ring. It's a range of number from 0 to MAX_INT (32-bit) in which if you traverse the range, when reaching MAX_INT, the next number would be zero. It is as if you took a line starting from 0 ending at MAX_INT and bent into a circle such that the end glues to the start:
+
+```
+ MAX_INT -----++--------- 0
+              ||
+         , - ~ ~ ~ - ,
+     , '               ' ,
+   ,                       ,
+  ,                         ,
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '
+```
+
+When adding a consumer, we mark 100 points on that circle and associate them to the newly added consumer. For each number between 1 and 100, we concatenate the consumer name to that number and run the hash function on it to get the location of the point on the circle that will be marked. For Example, if the consumer name is "orders-aggregator-pod-2345-consumer" then we would mark 100 points on that circle:
+```
+    murmur32("orders-aggregator-pod-2345-consumer1") = 1003084738
+    murmur32("orders-aggregator-pod-2345-consumer2") = 373317202
+    ...
+    murmur32("orders-aggregator-pod-2345-consumer100") = 320276078
+```
+
+Since the hash function has the uniform distribution attribute, those points would be uniformly distributed across the circle.
+
+```
+    C1-100                 
+         , - ~ ~ ~ - ,   C1-1
+     , '               ' ,
+   ,                       ,
+  ,                         , C1-2
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,  C1-3
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '      ...
+ 
+```
+
+A consumer is selected for a given message key by putting its hash on the circle then continue clock-wise on the circle until you reach a marking point. The point might have more than one consumer on it (hash function might have collisions) there for, we run the following operation to get a position within the list of consumers for that point, then we take the consumer in that position: `hash % consumer_list_size = index`.
+
+When a consumer is added, we add 100 marking points to the circle as explained before. Due to the uniform distribution of the hash function, those 100 points act as if the new consumer takes a small slice of keys out of each existing consumer. It maintains the even distribution, on the trade-off that it impacts all existing consumers. [This video](https://www.youtube.com/watch?v=zaRkONvyGr8) explains the concept of Consistent Hashing quite well (the only difference is that in Pulsar's case we used K points instead of K hash functions as noted in the comments)
+
+##### Sticky
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped to a multiple regions in this range and there is no overlap between regions. The consumer is selected by running a modulo operation on the message hash by the range size (65,536), the number received (0 <= i < 65,536), is contained within a single region. The consumer mapped to the region is the one selected.
+In this algorithm you have full control. Every newly added consumer specifies the ranges it wishes to be mapped to by using Consumer API. When the consumer object is constructed, you can specify the list of ranges. It's your responsibility to make sure there are no overlaps and all the range is covered by regions.
+
+Example:
+
+Suppose we have 2 consumers (C1 and C2) each specified their ranges, then:
+
+```
+C1 = [0, 16384), (32768, 49152]
+C2 = [16384, 32768), (49,152, 65536]
+ 
+ 0               16,384            32,768           49,152             65,536
+ |------- C1 ------|------- C2 ------|------- C1 ------|------- C2 ------|
+```
+
+Given a message key `Order-3459134`, it's hash would be `murmur32("Order-3459134") = 3112179635`, and it's index in the range would be `3112179635 mod 65536 = 6067`. That index is contained within `[0, 16384)` thus consumer C1 will map to this message key.
+
+If the newly connected consumer didn't supply their ranges, or they overlap with existing consumer ranges, it's disconnected, removed from the consumers list and reverted as if it never tried to connect.
+
+##### How to use them?
+
+When building the consumer, you can specify the Key Shared Mode: 
+* AUTO_SPLIT - Auto-split Hash Range
+* STICKY - Sticky
+
+Consistent Hashing will be used instead of Hash Range for Auto-split if the broker configuration `subscriptionKeySharedUseConsistentHashing` is enabled.
+
+##### Preserving order of processing

Review Comment:
   <img width="1041" alt="image" src="https://user-images.githubusercontent.com/989425/207335516-e6fec8ab-a63d-44e3-8b07-48dab6350bc2.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1345830327

   FYI: markdown doc changes can skip this step
   
   <img width="337" alt="image" src="https://user-images.githubusercontent.com/50226895/206956621-0444d577-1de3-4dec-9181-e3bfb25a6c87.png">
   
   https://github.com/apache/pulsar/blob/346b04a8149df0683891408456bd4306b39f2103/.github/changes-filter.yaml#L9
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047160400


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the existing connected consumers.
+
+```
+                      +--------------+                              +-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm / ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+
+```
+ 0               16,384            32,768           49,152             65,536
+ |------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+Given a message key `Order-3459134`, its hash would be `murmur32("Order-3459134") = 3112179635`, and its index in the range would be `3112179635 mod 65536 = 6067`. That index is contained within region `[0, 16384)` thus consumer C1 will be mapped to this message key.
+
+When a new consumer is connected, the largest region is chosen and is then split in half - the lower half will be mapped to the newly added consumer and upper half will be mapped to the consumer owning that region. Here is how it looks like from 1 to 4 consumers:
+
+```
+C1 connected:
+|---------------------------------- C1 ---------------------------------|
+
+C2 connected:
+|--------------- C2 ----------------|---------------- C1 ---------------|
+
+C3 connected:
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+
+C4 connected:
+|------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+When a consumer is disconnected its region will be merged into the region on its right. Examples:
+
+C4 is disconnected:
+
+```
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+```
+
+C1 is disconnected:
+
+```
+|------- C3 ------|-------------------------- C2 -----------------------|
+```
+
+The advantages of this algorithm is that it affects only a single existing consumer upon add/delete consumer, at the expense of regions not evenly sized. Thi means some consumers gets more keys that others. The next algorithm does the other way around. 
+
+##### Auto-split Consistent Hashing
+
+This algorithm uses a Hash Ring. It's a range of number from 0 to MAX_INT (32-bit) in which if you traverse the range, when reaching MAX_INT, the next number would be zero. It is as if you took a line starting from 0 ending at MAX_INT and bent into a circle such that the end glues to the start:
+
+```
+ MAX_INT -----++--------- 0
+              ||
+         , - ~ ~ ~ - ,
+     , '               ' ,
+   ,                       ,
+  ,                         ,
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '
+```
+
+When adding a consumer, we mark 100 points on that circle and associate them to the newly added consumer. For each number between 1 and 100, we concatenate the consumer name to that number and run the hash function on it to get the location of the point on the circle that will be marked. For Example, if the consumer name is "orders-aggregator-pod-2345-consumer" then we would mark 100 points on that circle:
+```
+    murmur32("orders-aggregator-pod-2345-consumer1") = 1003084738
+    murmur32("orders-aggregator-pod-2345-consumer2") = 373317202
+    ...
+    murmur32("orders-aggregator-pod-2345-consumer100") = 320276078
+```
+
+Since the hash function has the uniform distribution attribute, those points would be uniformly distributed across the circle.
+
+```
+    C1-100                 
+         , - ~ ~ ~ - ,   C1-1
+     , '               ' ,
+   ,                       ,
+  ,                         , C1-2
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,  C1-3
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '      ...
+ 
+```
+
+A consumer is selected for a given message key by putting its hash on the circle then continue clock-wise on the circle until you reach a marking point. The point might have more than one consumer on it (hash function might have collisions) there for, we run the following operation to get a position within the list of consumers for that point, then we take the consumer in that position: `hash % consumer_list_size = index`.
+
+When a consumer is added, we add 100 marking points to the circle as explained before. Due to the uniform distribution of the hash function, those 100 points act as if the new consumer takes a small slice of keys out of each existing consumer. It maintains the even distribution, on the trade-off that it impacts all existing consumers. [This video](https://www.youtube.com/watch?v=zaRkONvyGr8) explains the concept of Consistent Hashing quite well (the only difference is that in Pulsar's case we used K points instead of K hash functions as noted in the comments)
+
+##### Sticky

Review Comment:
   <img width="1046" alt="image" src="https://user-images.githubusercontent.com/989425/207335189-6d5fc449-26a0-4985-a4c3-221beab3037d.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045646566


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.

Review Comment:
   I converted it into a list of steps (2) and added an ASCII diagram for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047159786


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the existing connected consumers.
+
+```
+                      +--------------+                              +-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm / ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+
+```
+ 0               16,384            32,768           49,152             65,536
+ |------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+Given a message key `Order-3459134`, its hash would be `murmur32("Order-3459134") = 3112179635`, and its index in the range would be `3112179635 mod 65536 = 6067`. That index is contained within region `[0, 16384)` thus consumer C1 will be mapped to this message key.
+
+When a new consumer is connected, the largest region is chosen and is then split in half - the lower half will be mapped to the newly added consumer and upper half will be mapped to the consumer owning that region. Here is how it looks like from 1 to 4 consumers:
+
+```
+C1 connected:
+|---------------------------------- C1 ---------------------------------|
+
+C2 connected:
+|--------------- C2 ----------------|---------------- C1 ---------------|
+
+C3 connected:
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+
+C4 connected:
+|------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+When a consumer is disconnected its region will be merged into the region on its right. Examples:
+
+C4 is disconnected:
+
+```
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+```
+
+C1 is disconnected:
+
+```
+|------- C3 ------|-------------------------- C2 -----------------------|
+```
+
+The advantages of this algorithm is that it affects only a single existing consumer upon add/delete consumer, at the expense of regions not evenly sized. Thi means some consumers gets more keys that others. The next algorithm does the other way around. 
+
+##### Auto-split Consistent Hashing
+
+This algorithm uses a Hash Ring. It's a range of number from 0 to MAX_INT (32-bit) in which if you traverse the range, when reaching MAX_INT, the next number would be zero. It is as if you took a line starting from 0 ending at MAX_INT and bent into a circle such that the end glues to the start:
+
+```
+ MAX_INT -----++--------- 0
+              ||
+         , - ~ ~ ~ - ,
+     , '               ' ,
+   ,                       ,
+  ,                         ,
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '
+```
+
+When adding a consumer, we mark 100 points on that circle and associate them to the newly added consumer. For each number between 1 and 100, we concatenate the consumer name to that number and run the hash function on it to get the location of the point on the circle that will be marked. For Example, if the consumer name is "orders-aggregator-pod-2345-consumer" then we would mark 100 points on that circle:

Review Comment:
   <img width="1053" alt="image" src="https://user-images.githubusercontent.com/989425/207334834-3585dedf-dabd-417b-b2a3-4f116009ded4.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045384995


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.

Review Comment:
   Suggest changing the chunks to ordered steps and using workflow (illustrate) to explain the process and concept more clearly. For example, https://pulsar.apache.org/docs/next/schema-overview/#producer-side
   ![image](https://user-images.githubusercontent.com/50226895/206957180-81ba6802-a75a-47f2-9ea8-37817a87dd81.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047158588


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the existing connected consumers.
+
+```
+                      +--------------+                              +-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm / ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+
+```
+ 0               16,384            32,768           49,152             65,536
+ |------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+Given a message key `Order-3459134`, its hash would be `murmur32("Order-3459134") = 3112179635`, and its index in the range would be `3112179635 mod 65536 = 6067`. That index is contained within region `[0, 16384)` thus consumer C1 will be mapped to this message key.
+
+When a new consumer is connected, the largest region is chosen and is then split in half - the lower half will be mapped to the newly added consumer and upper half will be mapped to the consumer owning that region. Here is how it looks like from 1 to 4 consumers:
+
+```
+C1 connected:
+|---------------------------------- C1 ---------------------------------|
+
+C2 connected:
+|--------------- C2 ----------------|---------------- C1 ---------------|
+
+C3 connected:
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+
+C4 connected:
+|------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+When a consumer is disconnected its region will be merged into the region on its right. Examples:
+
+C4 is disconnected:

Review Comment:
   <img width="1050" alt="image" src="https://user-images.githubusercontent.com/989425/207334474-4e0ded80-61f3-4d9c-b9ce-28a4aaa1e6c0.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
tisonkun commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1350514520

   > > What is left is that I need to know how to make the diagrams consistent with the rest of the documentation
   > 
   > Normally, we use LucidChart to draw illustrations, e.g., this is what I'm working on: https://lucid.app/lucidchart/323370a9-0466-49e1-836a-968885aaf23a/edit?invitationId=inv_55ab3a55-7f98-4943-baa4-a8be8ddf7fac&page=0_0#
   
   @asafm as the PR merged now, you may send a new patch to update the diagrams :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1351723698

   @tisonkun Thanks :) Better that than no explanation on shared subscription hashing, for sure.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet merged pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
Anonymitaet merged PR #18878:
URL: https://github.com/apache/pulsar/pull/18878


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045627997


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.

Review Comment:
   I will add a new line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045632851


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+```

Review Comment:
   Added new lines everywhere :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1346223063

   > FYI: markdown doc changes can skip this step
   > 
   > <img alt="image" width="337" src="https://user-images.githubusercontent.com/50226895/206956621-0444d577-1de3-4dec-9181-e3bfb25a6c87.png">
   > 
   > https://github.com/apache/pulsar/blob/346b04a8149df0683891408456bd4306b39f2103/.github/changes-filter.yaml#L9
   
   @Anonymitaet Maybe we should note that in the documentation contribution guide. 
   I followed https://pulsar.apache.org/contribute/document-intro/, which says:
   ![image](https://user-images.githubusercontent.com/989425/207020274-51540430-f1ae-4330-a6d6-18855f7aed45.png)
   and here https://pulsar.apache.org/contribute/develop-labels/ it says
   ![image](https://user-images.githubusercontent.com/989425/207020420-cff2377a-6f56-4ffb-bfc3-a56bc124032b.png)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
tisonkun commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045340003


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+```

Review Comment:
   ditto these cases



##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.

Review Comment:
   ```suggestion
   ##### Auto-split Hash Range
   
   The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received (0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
   ```



##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.

Review Comment:
   If you intend to write these two lines in different paragraphs, insert a blank line between them. Otherwise, this paragraph reads too long at first glance.



##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.

Review Comment:
   We may try to insert a blank line between elements even when it's not required. Perhaps we need to write it explicitly in the documentation writing style guide. So it's now not a blocker.
   
   But I read you also insert blank lines between elements below, then let's keep the style consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1045631317


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,137 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. Before using the algorithm, the message key (or ordering key) is first passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. That hash number is then fed to the algorithm to select a consumer from the existing connected consumers.
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.

Review Comment:
   I'll add a new line in all headings I've added to be consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
tisonkun commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1348804777

   @asafm Yes. It seems authors write diagrams with their preferred tools. For the latest diagrams, you can contact with @DaveDuggins to see what's his tool.
   
   And we may later find a best practice to document on the contribution guide documentation section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1348795210

   What is left is that I need to know how to make the diagrams consistent with the rest of the documentation 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#issuecomment-1348507490

   >  Maybe we should note that in the documentation contribution guide.
   Yes, good catch, we should add a tip.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #18878: [improve][doc] Added docs for key shared subscription hashing schemes

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047157900


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and **Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. When a consumer connects or disconnects, it causes the served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same subscription. Messages are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer for a given message key (or ordering key): Sticky, Auto-split Hash Range, and Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the existing connected consumers.
+
+```
+                      +--------------+                              +-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm / ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected consumers, the algorithm re-adjusts the mapping such that some keys currently mapped to existing consumers will be mapped to the newly added consumer. When a consumer is disconnected, thus removed from the list of connected consumers, keys mapped to it will be mapped to other consumers. The sections below will explain how a consumer is selected given the message hash and how the mapping is adjusted given a new consumer is connected or an existing consumer disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). Each consumer is mapped into a single region in this range, so all mapped regions cover the entire range, and no regions overlap. A consumer is selected for a given key by running a modulo operation on the message hash by the range size (65,536). The number received ( 0 <= i < 65,536) is contained within a single region. The consumer mapped to that region is the one selected.
+
+Example:

Review Comment:
   <img width="1052" alt="image" src="https://user-images.githubusercontent.com/989425/207334135-f67e36d5-9db7-42d5-8bbd-0424c6ffd0e1.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org