You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/05/09 15:41:11 UTC

[GitHub] [pulsar] t3link opened a new issue #10523: Key_Shared subscription is not evenly distributed

t3link opened a new issue #10523:
URL: https://github.com/apache/pulsar/issues/10523


   I tested `pulsar 2.7.1 docker` on my local machine. Also java client version is `2.7.1`
   `1` producer with `4` consumer in a single app container configured below.
   The producer produced messages orderly with message key ranged from `test-0` to `test-9`
   ```java
   private static final AtomicLong KEY_GENERATOR = new AtomicLong();
   
   private static String randomKey() {
       var idx = KEY_GENERATOR.getAndIncrement() % 10;
       return "test-" + idx;
   }
   
   @Bean
   public Producer<String> producer(PulsarClient client) throws PulsarClientException {
       return client.newProducer(Schema.STRING)
               .topic("topic-test")
               .sendTimeout(10, TimeUnit.SECONDS)
               .create();
   }
   
   @Bean
   public List<Consumer<String>> consumer(PulsarClient client) throws PulsarClientException {
       var consumers = new ArrayList<Consumer<String>>(4);
       for (var i = 0; i < 4; i++) {
           var consumer = client.newConsumer(Schema.STRING)
                   .topic("topic-test")
                   .subscriptionName("sub-test")
                   .subscriptionType(SubscriptionType.Key_Shared)
                   .subscribe();
           consumers.add(consumer);
       }
       return consumers;
   }
   ```
   When I first run only one app container. The consumer statistics output seems not evenly distributed.
   ```
   {
       "consumer-thread-2":[
           "test-2",
           "test-1",
           "test-0",
           "test-6",
           "test-4"
       ],
       "consumer-thread-3":[
           "test-9"
       ],
       "consumer-thread-0":[
           "test-3"
       ],
       "consumer-thread-1":[
           "test-8",
           "test-7",
           "test-5"
       ]
   }
   ```
   
   Then 30 seconds later, I started a new app container. Now there are two containers running at the same time.
   
   The previous contiainer consumer statistics output changed!
   ```
   {
       "consumer-thread-2":[
           "test-4"
       ],
       "consumer-thread-3":[
           "test-9"
       ],
       "consumer-thread-0":[
           "test-3"
       ],
       "consumer-thread-1":[
           "test-8",
           "test-7"
       ]
   }
   ```
   
   And the newer container consumer statistics output like this.
   ```
   {
       "consumer-thread-0":[
           "test-2",
           "test-1",
           "test-0",
           "test-6"
       ],
       "consumer-thread-1":[
           "test-5"
       ]
   }
   ```
   
   **Part of consumer in the newer container is even not distributed to consume any message.**
   
   Is there some best practice advices for Key_Shared subscription? I think this is a comman situation in production environment. The app containers may be deployed dynamically and caused subscription change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mrkingfoxx commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
mrkingfoxx commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-835871650


   ``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mrkingfoxx commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
mrkingfoxx commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-835973093


   > I tested `pulsar 2.7.1 docker` on my local machine. Also java client version is `2.7.1`
   > `1` producer with `4` consumer in a single app container configured below.
   > The producer produced messages orderly with message key ranged from `test-0` to `test-9`
   > 
   > ```java
   > private static final AtomicLong KEY_GENERATOR = new AtomicLong();
   > 
   > private static String randomKey() {
   >     var idx = KEY_GENERATOR.getAndIncrement() % 10;
   >     return "test-" + idx;
   > }
   > 
   > @Bean
   > public Producer<String> producer(PulsarClient client) throws PulsarClientException {
   >     return client.newProducer(Schema.STRING)
   >             .topic("topic-test")
   >             .sendTimeout(10, TimeUnit.SECONDS)
   >             .create();
   > }
   > 
   > @Bean
   > public List<Consumer<String>> consumer(PulsarClient client) throws PulsarClientException {
   >     var consumers = new ArrayList<Consumer<String>>(4);
   >     for (var i = 0; i < 4; i++) {
   >         var consumer = client.newConsumer(Schema.STRING)
   >                 .topic("topic-test")
   >                 .subscriptionName("sub-test")
   >                 .subscriptionType(SubscriptionType.Key_Shared)
   >                 .subscribe();
   >         consumers.add(consumer);
   >     }
   >     return consumers;
   > }
   > ```
   > 
   > When I first run only one app container. The consumer statistics output seems not evenly distributed.
   > 
   > ```
   > {
   >     "consumer-thread-2":[
   >         "test-2",
   >         "test-1",
   >         "test-0",
   >         "test-6",
   >         "test-4"
   >     ],
   >     "consumer-thread-3":[
   >         "test-9"
   >     ],
   >     "consumer-thread-0":[
   >         "test-3"
   >     ],
   >     "consumer-thread-1":[
   >         "test-8",
   >         "test-7",
   >         "test-5"
   >     ]
   > }
   > ```
   > 
   > Then 30 seconds later, I started a new app container. Now there are two containers running at the same time.
   > 
   > The previous contiainer consumer statistics output changed!
   > 
   > ```
   > {
   >     "consumer-thread-2":[
   >         "test-4"
   >     ],
   >     "consumer-thread-3":[
   >         "test-9"
   >     ],
   >     "consumer-thread-0":[
   >         "test-3"
   >     ],
   >     "consumer-thread-1":[
   >         "test-8",
   >         "test-7"
   >     ]
   > }
   > ```
   > 
   > And the newer container consumer statistics output like this.
   > 
   > ```
   > {
   >     "consumer-thread-0":[
   >         "test-2",
   >         "test-1",
   >         "test-0",
   >         "test-6"
   >     ],
   >     "consumer-thread-1":[
   >         "test-5"
   >     ]
   > }
   > ```
   > 
   > **Part of consumer in the newer container is even not distributed to consume any message.**
   > 
   > Is there some best practice advices for Key_Shared subscription? I think this is a comman situation in production environment. The app containers may be deployed dynamically and caused subscription change.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] t3link closed issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
t3link closed issue #10523:
URL: https://github.com/apache/pulsar/issues/10523


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mrkingfoxx removed a comment on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
mrkingfoxx removed a comment on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-835871650


   ``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] t3link commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
t3link commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-836707922


   @codelipenghui Thanks! After have a look at `PersistentStickyKeyDispatcherMultipleConsumers.java`. I think none of the three selector strategies are suitable for my situation. Cause they all use `Murmur3_32Hash` to calculate a slot or range. This would like to work well when the message key set is large. I'll consider spliting my messages to some independent topics, and then each topic can be  subscribed by `ConsistentHashingStickyKeyConsumerSelector`. Also Thoses topics can be hashed by a simple modulo operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mrkingfoxx commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
mrkingfoxx commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-856439509


   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-835971460


   @mrkingfoxx The messages for the Key_Shared subscription are dispatched by the hash of the key, by default we have [0,65535] hash slots and a consumer receives messages from a fixed hash range. So it's can't perform the evenly distribution.  You can try to use the consistent hash for the key_shared subscription
   
   ```
   # On KeyShared subscriptions, with default AUTO_SPLIT mode, use splitting ranges or
   # consistent hashing to reassign keys to new consumers
   subscriptionKeySharedUseConsistentHashing=false
   
   # On KeyShared subscriptions, number of points in the consistent-hashing ring.
   # The higher the number, the more equal the assignment of keys to consumers
   subscriptionKeySharedConsistentHashingReplicaPoints=100
   ```
   
   And you can increase the `ReplicaPoints` for getting a more evenly distribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mrkingfoxx commented on issue #10523: Key_Shared subscription is not evenly distributed

Posted by GitBox <gi...@apache.org>.
mrkingfoxx commented on issue #10523:
URL: https://github.com/apache/pulsar/issues/10523#issuecomment-835871561


   > I tested `pulsar 2.7.1 docker` on my local machine. Also java client version is `2.7.1`
   > `1` producer with `4` consumer in a single app container configured below.
   > The producer produced messages orderly with message key ranged from `test-0` to `test-9`
   > 
   > ```java
   > private static final AtomicLong KEY_GENERATOR = new AtomicLong();
   > 
   > private static String randomKey() {
   >     var idx = KEY_GENERATOR.getAndIncrement() % 10;
   >     return "test-" + idx;
   > }
   > 
   > @Bean
   > public Producer<String> producer(PulsarClient client) throws PulsarClientException {
   >     return client.newProducer(Schema.STRING)
   >             .topic("topic-test")
   >             .sendTimeout(10, TimeUnit.SECONDS)
   >             .create();
   > }
   > 
   > @Bean
   > public List<Consumer<String>> consumer(PulsarClient client) throws PulsarClientException {
   >     var consumers = new ArrayList<Consumer<String>>(4);
   >     for (var i = 0; i < 4; i++) {
   >         var consumer = client.newConsumer(Schema.STRING)
   >                 .topic("topic-test")
   >                 .subscriptionName("sub-test")
   >                 .subscriptionType(SubscriptionType.Key_Shared)
   >                 .subscribe();
   >         consumers.add(consumer);
   >     }
   >     return consumers;
   > }
   > ```
   > 
   > When I first run only one app container. The consumer statistics output seems not evenly distributed.
   > 
   > ```
   > {
   >     "consumer-thread-2":[
   >         "test-2",
   >         "test-1",
   >         "test-0",
   >         "test-6",
   >         "test-4"
   >     ],
   >     "consumer-thread-3":[
   >         "test-9"
   >     ],
   >     "consumer-thread-0":[
   >         "test-3"
   >     ],
   >     "consumer-thread-1":[
   >         "test-8",
   >         "test-7",
   >         "test-5"
   >     ]
   > }
   > ```
   > 
   > Then 30 seconds later, I started a new app container. Now there are two containers running at the same time.
   > 
   > The previous contiainer consumer statistics output changed!
   > 
   > ```
   > {
   >     "consumer-thread-2":[
   >         "test-4"
   >     ],
   >     "consumer-thread-3":[
   >         "test-9"
   >     ],
   >     "consumer-thread-0":[
   >         "test-3"
   >     ],
   >     "consumer-thread-1":[
   >         "test-8",
   >         "test-7"
   >     ]
   > }
   > ```
   > 
   > And the newer container consumer statistics output like this.
   > 
   > ```
   > {
   >     "consumer-thread-0":[
   >         "test-2",
   >         "test-1",
   >         "test-0",
   >         "test-6"
   >     ],
   >     "consumer-thread-1":[
   >         "test-5"
   >     ]
   > }
   > ```
   > 
   > **Part of consumer in the newer container is even not distributed to consume any message.**
   > 
   > Is there some best practice advices for Key_Shared subscription? I think this is a comman situation in production environment. The app containers may be deployed dynamically and caused subscription change.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org