You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/20 08:29:53 UTC

[GitHub] [pulsar] eolivelli opened a new pull request, #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException (DRAFT)

eolivelli opened a new pull request, #17736:
URL: https://github.com/apache/pulsar/pull/17736

   ### Motivation
   
   It may happen that a ManagedLedger continues to work even in case of seeing a BadVersionException.
   For instance background activities like trimming ledgers, or offloading will continue to work.
   This is very dangerous as it leads to some kind of split brain (it is not actually a split brain): two brokers continue to process the metadata (and data) of the same ledger.
   
   ### Modifications
   
   - Force moving to Fenced state the ManagedLedger in case of BadVersionException
   - Fail the "offload loop" in case of Fenced state
   - Better cleanup of the PersistentTopic
   
   ### Verifying this change
   
   I will add tests
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [x] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE 
   
   After opening this PR, the build in apache/pulsar will fail and instructions will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments have
   been handled, the tests pass and the PR is approved by a reviewer.
   
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977249528


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -3842,12 +3868,21 @@ private void scheduleTimeoutTask() {
                     ? Math.max(config.getAddEntryTimeoutSeconds(), config.getReadEntryTimeoutSeconds())
                     : timeoutSec;
             this.timeoutTask = this.scheduledExecutor.scheduleAtFixedRate(safeRun(() -> {
-                checkAddTimeout();
-                checkReadTimeout();
+                checkTimeouts();
             }), timeoutSec, timeoutSec, TimeUnit.SECONDS);
         }
     }
 
+    private void checkTimeouts() {
+        final State state = STATE_UPDATER.get(this);
+        if (state == State.Closed
+                || state == State.Fenced) {
+            return;

Review Comment:
   there are already logs that say that we fenced or closed the topic.
   I don't think it is worth to add something more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] gaoran10 commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977280528


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -2463,12 +2473,19 @@ void internalTrimLedgers(boolean isTruncate, CompletableFuture<?> promise) {
                 log.debug("[{}] Start TrimConsumedLedgers. ledgers={} totalSize={}", name, ledgers.keySet(),
                         TOTAL_SIZE_UPDATER.get(this));
             }
-            if (STATE_UPDATER.get(this) == State.Closed) {
+            State currentState = STATE_UPDATER.get(this);
+            if (currentState == State.Closed) {
                 log.debug("[{}] Ignoring trimming request since the managed ledger was already closed", name);
                 trimmerMutex.unlock();
                 promise.completeExceptionally(new ManagedLedgerAlreadyClosedException("Can't trim closed ledger"));
                 return;
             }
+            if (currentState == State.Fenced) {
+                log.debug("[{}] Ignoring trimming request since the managed ledger was already fenced", name);
+                trimmerMutex.unlock();

Review Comment:
   Sorry, after checking the code, I think we should release the lock, or else the scheduled task will always try to get the lock per 100 milliseconds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
eolivelli commented on PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#issuecomment-1253380790

   @dlg99 I have fixed the test.
   The test was trying do something that cannot happen.
   it was calling "ledgerManager.delete()" and then "truncate()".
   
   I removed "delete", the test is still meaningful, because it tests that the truncation works in case of missing ledgers (due to concurrent execution or previous deletion)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] hangc0276 commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r976525563


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -3842,12 +3868,21 @@ private void scheduleTimeoutTask() {
                     ? Math.max(config.getAddEntryTimeoutSeconds(), config.getReadEntryTimeoutSeconds())
                     : timeoutSec;
             this.timeoutTask = this.scheduledExecutor.scheduleAtFixedRate(safeRun(() -> {

Review Comment:
   Suggest change 
   ```
   safeRun(() -> {
                   checkTimeouts();
               })
   ```
   
   to `safeRun(this::checkTimeouts)`



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -3639,6 +3664,7 @@ private void checkManagedLedgerIsOpen() throws ManagedLedgerException {
     }
 
     synchronized void setFenced() {
+        log.info("{} Moving to Fenced state", name);

Review Comment:
   Do we need to change the log level to `warn`?



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -3842,12 +3868,21 @@ private void scheduleTimeoutTask() {
                     ? Math.max(config.getAddEntryTimeoutSeconds(), config.getReadEntryTimeoutSeconds())
                     : timeoutSec;
             this.timeoutTask = this.scheduledExecutor.scheduleAtFixedRate(safeRun(() -> {
-                checkAddTimeout();
-                checkReadTimeout();
+                checkTimeouts();
             }), timeoutSec, timeoutSec, TimeUnit.SECONDS);
         }
     }
 
+    private void checkTimeouts() {
+        final State state = STATE_UPDATER.get(this);
+        if (state == State.Closed
+                || state == State.Fenced) {
+            return;

Review Comment:
   Do we need to add a `warn` log?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#issuecomment-1253698173

   We already handled the BadVersion in this way before https://github.com/apache/pulsar/pull/17736/files#diff-f6a849bd8fdb782ef6c17a2e07a2c54c3dc7d1655c00ec3546d5f3b3fc61e970L1537
   
   Looks good to me. Approved the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] gaoran10 commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977130919


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -2463,12 +2473,19 @@ void internalTrimLedgers(boolean isTruncate, CompletableFuture<?> promise) {
                 log.debug("[{}] Start TrimConsumedLedgers. ledgers={} totalSize={}", name, ledgers.keySet(),
                         TOTAL_SIZE_UPDATER.get(this));
             }
-            if (STATE_UPDATER.get(this) == State.Closed) {
+            State currentState = STATE_UPDATER.get(this);
+            if (currentState == State.Closed) {
                 log.debug("[{}] Ignoring trimming request since the managed ledger was already closed", name);
                 trimmerMutex.unlock();
                 promise.completeExceptionally(new ManagedLedgerAlreadyClosedException("Can't trim closed ledger"));
                 return;
             }
+            if (currentState == State.Fenced) {
+                log.debug("[{}] Ignoring trimming request since the managed ledger was already fenced", name);
+                trimmerMutex.unlock();

Review Comment:
   Maybe we don't need to unlock the `trimmerMutex`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException (DRAFT)

Posted by GitBox <gi...@apache.org>.
eolivelli commented on PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#issuecomment-1252053553

   I am adding test cases in order to have examples of the problems and to prevent regressions in the future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977248421


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -2463,12 +2473,19 @@ void internalTrimLedgers(boolean isTruncate, CompletableFuture<?> promise) {
                 log.debug("[{}] Start TrimConsumedLedgers. ledgers={} totalSize={}", name, ledgers.keySet(),
                         TOTAL_SIZE_UPDATER.get(this));
             }
-            if (STATE_UPDATER.get(this) == State.Closed) {
+            State currentState = STATE_UPDATER.get(this);
+            if (currentState == State.Closed) {
                 log.debug("[{}] Ignoring trimming request since the managed ledger was already closed", name);
                 trimmerMutex.unlock();
                 promise.completeExceptionally(new ManagedLedgerAlreadyClosedException("Can't trim closed ledger"));
                 return;
             }
+            if (currentState == State.Fenced) {
+                log.debug("[{}] Ignoring trimming request since the managed ledger was already fenced", name);
+                trimmerMutex.unlock();

Review Comment:
   I am not sure.
   I prefer to keep the ML in a clean status.
   I did the same it works for a "closed" ML.
   it is very like to being "Closed" in this point



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977250234


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -3639,6 +3664,7 @@ private void checkManagedLedgerIsOpen() throws ManagedLedgerException {
     }
 
     synchronized void setFenced() {
+        log.info("{} Moving to Fenced state", name);

Review Comment:
   This is not a "problem".
   it may happen, and we are handling it safely.
   there is nothing that the sysadmin should be afraid of.
   
   we should log "WARN" or "ERROR" when there is something bad, and you have to take extra care



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli merged pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
eolivelli merged PR #17736:
URL: https://github.com/apache/pulsar/pull/17736


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] gaoran10 commented on a diff in pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on code in PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#discussion_r977269595


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -2463,12 +2473,19 @@ void internalTrimLedgers(boolean isTruncate, CompletableFuture<?> promise) {
                 log.debug("[{}] Start TrimConsumedLedgers. ledgers={} totalSize={}", name, ledgers.keySet(),
                         TOTAL_SIZE_UPDATER.get(this));
             }
-            if (STATE_UPDATER.get(this) == State.Closed) {
+            State currentState = STATE_UPDATER.get(this);
+            if (currentState == State.Closed) {
                 log.debug("[{}] Ignoring trimming request since the managed ledger was already closed", name);
                 trimmerMutex.unlock();
                 promise.completeExceptionally(new ManagedLedgerAlreadyClosedException("Can't trim closed ledger"));
                 return;
             }
+            if (currentState == State.Fenced) {
+                log.debug("[{}] Ignoring trimming request since the managed ledger was already fenced", name);
+                trimmerMutex.unlock();

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dlg99 commented on pull request #17736: [bugfix] ManagedLedger: move to FENCED state in case of BadVersionException

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #17736:
URL: https://github.com/apache/pulsar/pull/17736#issuecomment-1252710552

   LGTM but it broke testTruncateCorruptDataLedger 
   ```
     Error:  Tests run: 9, Failures: 1, Errors: 0, Skipped: 3, Time elapsed: 45.675 s <<< FAILURE! - in org.apache.pulsar.broker.service.BrokerBkEnsemblesTests
     Error:  testTruncateCorruptDataLedger(org.apache.pulsar.broker.service.BrokerBkEnsemblesTests)  Time elapsed: 3.13 s  <<< FAILURE!
     org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
     
      --- An unexpected error occurred in the server ---
     
     Message: Can't trim fenced ledger
     
     Stacktrace:
     
     org.apache.bookkeeper.mledger.ManagedLedgerException$ManagedLedgerAlreadyClosedException: Can't trim fenced ledger
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org