You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/09 11:12:09 UTC

[GitHub] [pulsar] dragonls opened a new pull request, #17573: [fix][broker] Fix unexpected subscription deletion

dragonls opened a new pull request, #17573:
URL: https://github.com/apache/pulsar/pull/17573

   Fixes #17572
   
   ### Motivation
   
   The `lastActive` in `ManagedCursorImpl` only be updated in 3 places:
   - `org.apache.pulsar.broker.service.persistent.PersistentSubscription#addConsumer`
   - `org.apache.pulsar.broker.service.persistent.PersistentSubscription#removeConsumer`
   - `org.apache.bookkeeper.mledger.impl.ManagedCursorImpl#internalResetCursor`
   
   If there are brokers shutdown with OOM or other situation, which make the broker not shutdown gracefully. And the new broker load the topics and then do subscription expiry check soon before consumer reconnected, then the subscription may be deleted unexpectedly.
   
   Need to update `lastActive` in `ManagedCursorImpl` while consuming stably.
   
   ### Modifications
   
   - Add  `cursor.updateLastActive()` in `org.apache.pulsar.broker.service.persistent.PersistentSubscription#acknowledgeMessage` and `org.apache.pulsar.broker.service.nonpersistent.NonPersistentSubscription#acknowledgeMessage`.
   - Add some log about cursor last active timestamp.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
     - *TODO*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [ ] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion caused by the cursor last active time not updated in time

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1249223938

   @dragonls I have approved the PR. Please also update your last comment to the PR description. So that we don't to go through all the comment to understand what is the scope of this PR and what is not the scope of this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1248158692

   > > If there are brokers shutdown with OOM or other situation, which make the broker not shutdown gracefully, the new broker load the topics and then do subscription expiry check soon before consumer reconnected, then the subscription may be deleted unexpectedly.
   > 
   > If the topic is loaded by another broker, the `lastActive` will be reset to `clock.millis()`
   > 
   > https://github.com/apache/pulsar/blob/a6fe5bb2f88c946e1bc27a45fad1751aa76bd971/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L305
   > 
   > How the above situation can happen?
   
   `lastActive` is set to `clock.millis()` while initilized, but will be covered by `recover()` soon.
   https://github.com/apache/pulsar/blob/a6fe5bb2f88c946e1bc27a45fad1751aa76bd971/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1248214598

   > lastActive is set to clock.millis() while initilized, but will be covered by recover() soon.
   
   I see.
   
   Actually, only ledger rollover and updating the cursor Ledger failed will cause the `lastActive` update to zookeeper right?
   It also has a chance to delete the subscription unexpectedly if no ledger rollover and cursor ledger update failure happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1248861976

   > If we can persistent the `lastActive` to cursor Ledger, we will get better for this situation.
   
   I will take a look at this and try to persistent the `lastActive` to cursor ledger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] HQebupt commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
HQebupt commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1243745878

   Please help to add a test to avoid the regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1249202159

   After discussed with @codelipenghui , this PR mainly fix the bug that subscription may be deleted unexpectedly caused by `lastActive` not being updated in time, which depends on the `lastActive` persistence in zk.
   
   `lastActive` is saved in zk in `org.apache.bookkeeper.mledger.impl.ManagedCursorImpl#persistPositionMetaStore`:
   https://github.com/apache/pulsar/blob/d7c09be15c3502e95bbd5862b210cc8d7f473c1e/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L2500-L2514
   
   However, the persistence is triggered only in 4 places:
   ![image](https://user-images.githubusercontent.com/2565118/190616653-58d7ec18-e998-4e7c-b39c-a1689516e307.png)
   
   All of them are not high frequency triggered, which means the unexpectedly subscription deletion still can happen during two persistence with broker shutdown not gracefully.
   
   The better solution is to save the `lastActive` into cursor ledger instead of only save to zk, but this is not the problem this PR solving, and need further discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1248228956

   If we can persistent the `lastActive` to cursor Ledger, we will get better for this situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] HQebupt commented on a diff in pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
HQebupt commented on code in PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#discussion_r968423063


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentSubscription.java:
##########
@@ -210,7 +210,7 @@ public void consumerFlow(Consumer consumer, int additionalNumberOfMessages) {
 
     @Override
     public void acknowledgeMessage(List<Position> position, AckType ackType, Map<String, Long> properties) {
-        // No-op
+        updateLastActive();

Review Comment:
   Please help add a test to avoid the regression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on a diff in pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on code in PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#discussion_r969093945


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentSubscription.java:
##########
@@ -210,7 +210,7 @@ public void consumerFlow(Consumer consumer, int additionalNumberOfMessages) {
 
     @Override
     public void acknowledgeMessage(List<Position> position, AckType ackType, Map<String, Long> properties) {
-        // No-op
+        updateLastActive();

Review Comment:
   I will add a test to cover this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1244828697

   > If I understand correctly, this issue does not relate to the broker restart, right? It is that the subscriptionExpirationTimeMinutes should be the last ack time of the consumers in the subscription. Instead of the last time that the consumer was modified (consumer add or remove) in the subscription.
   
   If the broker shutdown gracefully, everything is fine. Because it will go through the `RemoveConsumer`, the `lastActive` will be updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui merged pull request #17573: [fix][broker] Fix unexpected subscription deletion caused by the cursor last active time not updated in time

Posted by GitBox <gi...@apache.org>.
codelipenghui merged PR #17573:
URL: https://github.com/apache/pulsar/pull/17573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on a diff in pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on code in PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#discussion_r970456697


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentSubscription.java:
##########
@@ -210,7 +210,7 @@ public void consumerFlow(Consumer consumer, int additionalNumberOfMessages) {
 
     @Override
     public void acknowledgeMessage(List<Position> position, AckType ackType, Map<String, Long> properties) {
-        // No-op
+        updateLastActive();

Review Comment:
   I found that non-persistent consumer ack will not send ack command to broker, so removing the modification in `NonPersistentSubscription.java`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] liangyepianzhou commented on pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
liangyepianzhou commented on PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#issuecomment-1242741823

   If I understand correctly, this issue does not relate to the broker restart, right?
   It is that the subscriptionExpirationTimeMinutes should be the last ack time of the consumers in the subscription.
   Instead of the last time that the consumer was modified (consumer add or remove) in the subscription. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on a diff in pull request #17573: [fix][broker] Fix unexpected subscription deletion

Posted by GitBox <gi...@apache.org>.
dragonls commented on code in PR #17573:
URL: https://github.com/apache/pulsar/pull/17573#discussion_r970456697


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentSubscription.java:
##########
@@ -210,7 +210,7 @@ public void consumerFlow(Consumer consumer, int additionalNumberOfMessages) {
 
     @Override
     public void acknowledgeMessage(List<Position> position, AckType ackType, Map<String, Long> properties) {
-        // No-op
+        updateLastActive();

Review Comment:
   I found that non-persistent consumer ack will not sent ack command to broker, so removing the modification in `NonPersistentSubscription.java`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org