You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/08 01:34:40 UTC

[GitHub] [pulsar] devinbost commented on issue #6054: Catastrophic frequent random topic freezes, especially on high-traffic topics.

devinbost commented on issue #6054:
URL: https://github.com/apache/pulsar/issues/6054#issuecomment-815382279


   > > After taking a closer look at #7266, it's not clear if it will resolve the issue because we're seeing this issue with functions running with exclusive subscriptions, not shared subscriptions.
   > 
   > You sure? I didn't know it was possible to have a function on an exclusive subscription.
   
   I double-checked, and you were right that we're using shared subscriptions. However, with the way we're using the functions, each function has its own subscription, so we aren't routing messages between multiple consumers of the same shared subscription. It looks like the behavior mentioned in #7266 applies primarily to cases involving multiple consumers on a single shared subscription. 
   
   Regardless, I noticed that we're not checking for success when sending permits from the consumer to the broker (https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L857), and every time we attempt to send permits from the consumer to the broker, we reset the consumer's permits (https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1368). So, it's possible that if there's a communication problem between the consumer and the broker that their permit numbers could get out of sync. This PR should help with those out-of-sync issues involving permits by getting things back on track when they're out of sync: https://github.com/apache/pulsar/pull/9789
   Also, I created this PR to add logging when there's a communication issue when sending available permits to the broker: https://github.com/apache/pulsar/pull/10166
   
   However, it's still not clear to me how negative permits are getting calculated when only a single consumer exists for a shared subscription. The consumer either increases permits or resets them to 0. The consumer never decrements permits. This may explain why I was unable to see negative permits on the consumer in the function heap dumps. 
   I'm hoping that @rdhabalia will chime in to share some insight into this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org