You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "ThomasTaketurns (via GitHub)" <gi...@apache.org> on 2023/10/05 09:57:03 UTC

[I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

ThomasTaketurns opened a new issue, #21299:
URL: https://github.com/apache/pulsar/issues/21299

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.
   
   
   ### Version
   
   Pulsar version : 3.0.0
   
   Broker detailed configuration: 
   
   [brokerconf.txt](https://github.com/apache/pulsar/files/12816895/brokerconf.txt)
   
   
   ### Minimal reproduce step
   
   I have Pulsar running on a K8S cluster with : 
   
   - 2 brokers
   - A single main partition topic with 64 partitions.
   - The load balancing is configured as follow : 
       loadBalancerDebugModeEnabled: "true"
       loadBalancerSheddingIntervalMinutes: "1"
       loadBalancerSheddingGracePeriodMinutes: "1"
       loadBalancerLoadSheddingStrategy: "org.apache.pulsar.broker.loadbalance.impl.UniformLoadShedder"
       loadBalancerMsgRateDifferenceShedderThreshold: "1"
       loadBalancerMsgThroughputMultiplierDifferenceShedderThreshold: "1.1"
       maxUnloadPercentage: "0.5"
       loadBalancerAutoBundleSplitEnabled: "true"
       loadBalancerAutoUnloadSplitBundlesEnabled: "true"
       defaultNamespaceBundleSplitAlgorithm: "range_equally_divide"
       loadBalancerNamespaceBundleMaxTopics: "4"
   
   N.B.: I voluntarily set low values for loadBalancerMsgRateDifferenceShedderThreshold and loadBalancerMsgThroughputMultiplierDifferenceShedderThreshold since for the purpose of the test I want the shedding mechanism to trigger as soon as possible.
   
   - Horizontal Pod Autoscaler configured on the brokers to trigger when CPU reachs some usage.
   
   Bundles are initially evenly shared between the 2 brokers.
   Then I start sending messages to Pulsar topic, the topic has 8 subscriptions.
   At some point, the broker average CPU will reach a threshold and HPAs are triggered.
   After some time, I have 8 brokers available, but the shedding mechanism does not trigger event though only the 2 initial brokers are working.
   
   ![image](https://github.com/apache/pulsar/assets/127508079/25db655b-7703-4b2b-b48f-526ad4fae422)
   ![image](https://github.com/apache/pulsar/assets/127508079/5a33c684-e3c9-4b50-9e43-726b61632fe3)
   
   
   
   ### What did you expect to see?
   
   I would expect the load manager to redirect bundles from taketurns-pulsar-broker-0 and taketurns-pulsar-broker-1 to other available brokers.
   
   ### What did you see instead?
   
   I can see the following message in the logs : 
   
   2023-10-05 11:41:46.486 | 2023-10-05T09:41:46,486+0000 [pulsar-web-42-8] INFO  org.apache.pulsar.broker.loadbalance.extensions.manager.RedirectManager - We don't need to redirect, current load manager class name: org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
   2023-10-05 11:41:46.483 | 2023-10-05T09:41:46,483+0000 [pulsar-web-42-8] INFO  org.apache.pulsar.broker.loadbalance.extensions.manager.RedirectManager - We don't need to redirect, current load manager class name: org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
   2023-10-05 11:41:46.292 | 2023-10-05T09:41:46,292+0000 [pulsar-web-42-5] INFO  org.apache.pulsar.broker.loadbalance.extensions.manager.RedirectManager - We don't need to redirect, current load manager class name: org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
   
   
   ### Anything else?
   
   If I unload the namespace manually, topics are reassigned but I still see some brokers not being used and the repartition is not evenly done. taketurns-pulsar-broker-0 and taketurns-pulsar-broker-1 have been totally unloaded and other 6 brokers now have all partitions attached.
   
   ./bin/pulsar-admin namespaces unload taketurns/bench
   
   ![image](https://github.com/apache/pulsar/assets/127508079/895f3252-81a2-4708-9e31-39f605663652)
   
   
   Thanks for the help,
   
   Sincerely,
   
   Thomas
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "ThomasTaketurns (via GitHub)" <gi...@apache.org>.
ThomasTaketurns commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1782476936

   (I was in debug mode and did not retrieve any relevant info)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "ThomasTaketurns (via GitHub)" <gi...@apache.org>.
ThomasTaketurns commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1776662196

   Hi @heesung-sn , 
   
   Thanks for getting back to me.
   I am on another topic right now but I will get back to you with the requested infos later this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "heesung-sn (via GitHub)" <gi...@apache.org>.
heesung-sn commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1783606532

       @FieldContext(
               dynamic = true,
               category = CATEGORY_LOAD_BALANCER,
               doc = "In the UniformLoadShedder strategy, the minimum message that triggers unload."
       )
       private int minUnloadMessage = 1000;
   
       @FieldContext(
               dynamic = true,
               category = CATEGORY_LOAD_BALANCER,
               doc = "In the UniformLoadShedder strategy, the minimum throughput that triggers unload."
       )
       private int minUnloadMessageThroughput = 1 * 1024 * 1024;
       
   
   I think the UniformLoadShedder logic has these min thresholds to trigger unloading. Please put smaller thresholds for these in broker.conf and try.
   
   Again, ThresholdShedder strategy is the default one(and recommended in the current pulsar version).
   
       


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "heesung-sn (via GitHub)" <gi...@apache.org>.
heesung-sn commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1785536827

   `loadBalancerBundleUnloadMinThroughputThreshold=0`
   
   Please try to set this config to zero in broker.conf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "heesung-sn (via GitHub)" <gi...@apache.org>.
heesung-sn closed issue #21299: [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy
URL: https://github.com/apache/pulsar/issues/21299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "ThomasTaketurns (via GitHub)" <gi...@apache.org>.
ThomasTaketurns commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1785608895

   Hi @heesung-sn ,
   
   Thanks for the quick feedback.
   I applied your recommandation and it seems to do the job.
   
   Let me perform a few more tests and I will close the ticket tomorrow if everything is ok.
   
   Thanks for helping me, have a good day !
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "heesung-sn (via GitHub)" <gi...@apache.org>.
heesung-sn commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1765072630

   - Can you try to grep all load balancer broker configs? I think you need to confirm if the load balancer and shedding are enabled first.
   For example, `grep loadBal broker.conf`
   
   - Any reason not to try `ThresholdShedder`, as that's the default option now?
   - Also, can you plot the unloading metrics? 
   ```
   
   
   pulsar_lb_unload_broker_total | Counter | Unload broker count in this bundle unloading
   -- | -- | --
   pulsar_lb_unload_bundle_total | Counter | Bundle unload count in this bundle unloading. If the value of pulsar_lb_unload_bundle_total is greater than zero, it means that the bundle has been unloaded.
   
   
   
   ```
   - Please try the debug mode and see if it gives some more info.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "ThomasTaketurns (via GitHub)" <gi...@apache.org>.
ThomasTaketurns commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1785498101

   Hi @heesung-sn ,
   
   By applying your recommandations, I was able to make the UniformLoadShedder trigger bundles unloading.
   I still need to tune it to make it efficient but it works.
   
   Since I want to fit to your recommandations, I am now trying to make the ThresholdShedder work but I never see any bundle nor broker being unloaded.
   
   Here are my observations : 
   
   Broker publish/dispatch rates : 
   
   ![image](https://github.com/apache/pulsar/assets/127508079/b42ff51e-3727-404b-aa32-ccc1f22c0ed4)
   
   Here is the cpu usage for each broker replica : 
   
   ![image](https://github.com/apache/pulsar/assets/127508079/1e01327b-4d6e-4970-9c1d-03a0427fc738)
   
   I rely only on the cpu in order to evaluate the usage difference between each broker : 
   
       loadBalancerBandwithInResourceWeight: "0"
       loadBalancerBandwithOutResourceWeight: "0"
       loadBalancerCPUResourceWeight: "1"
       loadBalancerMemoryResourceWeight: "0"
       loadBalancerDirectMemoryResourceWeight: "0"
   
   However, even if the newly created brokers are very low on CPU usage, there is no unloading at all : 
   
   ![image](https://github.com/apache/pulsar/assets/127508079/ab5ccdd7-36f1-41de-8d55-23c66ccac1b0)
   
   Please find my broker configuration attached : 
   [brokerconf.txt](https://github.com/apache/pulsar/files/13207039/brokerconf.txt)
   
   Am I missing something here ? Do you know why the shedding is not triggered ?
   
   Many thanks,
   
   Thomas
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] ModularLoadManager is not shedding with UniformLoadShedder strategy [pulsar]

Posted by "ThomasTaketurns (via GitHub)" <gi...@apache.org>.
ThomasTaketurns commented on issue #21299:
URL: https://github.com/apache/pulsar/issues/21299#issuecomment-1781322987

   Hi @heesung-sn ,
   
   Please find attached the broker configuration.
   
   [brokerconf.txt](https://github.com/apache/pulsar/files/13179277/brokerconf.txt)
   
   I found it in the broker logs when the broker starts : 
   [main] INFO  org.apache.pulsar.broker.PulsarService - messaging service is ready, bootstrap_seconds=14, bootstrap service port = 8080, ...etc.
   
   I was not able to make the ThresholdShedder work either but might finish by using it if I find a way to make it work.
   
   Here are the unloading metrics on today's test period (no broker nor bundle are reloaded) : 
   ![image](https://github.com/apache/pulsar/assets/127508079/abb7023c-81c4-44eb-8f07-732ed0a03bf5)
   
   And this is the replica count metric for the broker on the same period : 
   ![image](https://github.com/apache/pulsar/assets/127508079/79a3ef7b-7e51-433c-a8d9-a50560c141b1)
   
   As well as the brokers publish and dispatch rates on the same period : 
   ![image](https://github.com/apache/pulsar/assets/127508079/6f0be33e-954f-44bb-b03d-83a36355eb14)
   
   Please tell me if you need anything else !
   
   Thank you for your help,
   
   Sincerely,
   
   Thomas
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org