You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/20 14:17:04 UTC

[GitHub] [pulsar] baomingyu opened a new issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

baomingyu opened a new issue #10284:
URL: https://github.com/apache/pulsar/issues/10284


   In such scene , consumer will be stucked after restart.
   First  step , tow consumers with key_shared subscription type and same group.
              such as  consumer1 and consumer2
   Second  step, broker receive consumer1 flow command with 1000 permits and do not get consumer2's flow command.
   Third  step, broker start send message to consumer, but messages whit keys are assigned to consumer2, so it will not send any message to consumers;
   Fourth step ,  Next loop send time, getRestrictedMaxEntriesForConsumer will aways return 0, and will not send any messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-963971211


   We see this problem roughly 50% of the time when we do a rolling restart of our consumers, e.g., 5 at a time for total of 20.  Normally the only way to fix it is to unload the bundle housing the topic. We've been experiencing this on all versions since 2.6.3 and have seen several "stuck consumer" type issues be marked as resolved but still issues with key_shared remain. Is there anything we can do when we experience this issue to assist in getting it fixed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
codelipenghui closed issue #10284:
URL: https://github.com/apache/pulsar/issues/10284


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix edited a comment on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix edited a comment on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964309950


   > @james-bright-helix Do you have a way to reproduce the issue on 2.8.1?
   
   @codelipenghui not consistently in a way that's not disruptive. We have to bounce our production app and then it happens frequently. we see it very rarely in our non-production envs which are much smaller. Are there any additional logging/metrics we can gather to share when it does happen?
   One thing we noticed is that if you bounce only some of the consumers, e.g., 5 of 20 consumers, then the backlog is sometimes processed for a while before stopping again. Unloading the topic/namespace has been our only consistent way to recover.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964121389


   @codelipenghui sorry my message wasn't clear. We're on 2.8.1 and tried every version since 2.6.3 but still suffering from stuck key_shared consumers. I was hoping the attached PR was going to fix our issue but it seems to not be progressing hence the offer to help provide more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964292795


   @james-bright-helix Do you have a way to reproduce the issue on 2.8.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix edited a comment on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix edited a comment on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-965270390


   @codelipenghui I thought I should mention it's possibly related to #12208 as we are using reconsumeLaterAsync() on all topics (if we have an environmental error we want to retry after a delay) and on one topic we also use deliverAfter() although we don't usually see anything on the retry topics when it's stuck. As mentioned in that issue, it's not clear if these are expected to work with key_shared subscriptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964078760


   @james-bright-helix You can try out 2.8.1 or 2.7.3 which contains https://github.com/apache/pulsar/pull/10920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix edited a comment on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix edited a comment on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964309950


   > @james-bright-helix Do you have a way to reproduce the issue on 2.8.1?
   
   not consistently in a way that's not disruptive. We have to bounce our production app and then it happens frequently. we see it very rarely in our non-production envs which are much smaller. Are there any additional logging/metrics we can gather to share when it does happen?
   One thing we noticed is that if you bounce only some of the consumers, e.g., 5 of 20 consumers, then the backlog is sometimes processed for a while before stopping again. Unloading the topic/namespace has been our only consistent way to recover.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-965270390


   @codelipenghui i Thought I should mention it's possibly related to #12208 as we are using reconsumeLaterAsync() on all topics (if we have an environmental error we want to retry after a delay) and on one topic we also use deliverAfter() although we don't usually see anything on the retry topics when it's stuck. As mentioned in that issue, it's not clear if these are expected to work with key_shared subscriptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] james-bright-helix commented on issue #10284: Occasional consumer stucked when restart consumer whit key_shared subscription type .

Posted by GitBox <gi...@apache.org>.
james-bright-helix commented on issue #10284:
URL: https://github.com/apache/pulsar/issues/10284#issuecomment-964309950


   > @james-bright-helix Do you have a way to reproduce the issue on 2.8.1?
   not consistently in a way that's not disruptive. We have to bounce our production app and then it happens frequently. we see it very rarely in our non-production envs which are much smaller. Are there any additional logging/metrics we can gather to share when it does happen?
   One thing we noticed is that if you bounce only some of the consumers, e.g., 5 of 20 consumers, then the backlog is sometimes processed for a while before stopping again. Unloading the topic/namespace has been our only consistent way to recover.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org