You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/20 02:28:07 UTC

[GitHub] [pulsar] codelipenghui commented on a diff in pull request #18123: [improve][ml]Make cursor-recover retry when ledger-recover timeout

codelipenghui commented on code in PR #18123:
URL: https://github.com/apache/pulsar/pull/18123#discussion_r1000082416


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java:
##########
@@ -500,6 +502,15 @@ protected void recoverFromLedger(final ManagedCursorInfo info, final VoidCallbac
             if (log.isInfoEnabled()) {
                 log.info("[{}] Opened ledger {} for consumer {}. rc={}", ledger.getName(), ledgerId, name, rc);
             }
+            // If ledgerMeta exists, it means ledger is not deleted normally and bookie is trying to recover the data.
+            // We will make several attempts to ensure that do not lose cursor data.
+            if (maybeLedgerRecovering(rc) && retryTimesWhenLedgerRecovering > 0) {
+                log.warn("[{}] Opened ledger {} for consumer {}. rc={}. Ledger maybe is recovering,"
+                        + " try to recover again, at most try {} times", ledger.getName(), ledgerId, name, rc,
+                        retryTimesWhenLedgerRecovering - 1);
+                recoverFromLedger(info, callback, retryTimesWhenLedgerRecovering - 1);
+                return;
+            }

Review Comment:
   Looks like we can just add the `TimeoutException` to the recoverable error list. 
   
   ```java
   public static boolean isBkErrorNotRecoverable(int rc) {
       switch (rc) {
       case Code.NoSuchLedgerExistsException:
       case Code.NoSuchLedgerExistsOnMetadataServerException:
       case Code.ReadException:
       case Code.LedgerRecoveryException:
       case Code.NoSuchEntryException:
           return true;
   
       default:
           return false;
       }
   }
   ```
   
   We introduced a new retry mechanism, but if we reach the max retry count, the subscription will rewind to the oldest available entry. 
   
   This is unexpected from the user's perspective, we should not skip the recovery from the bookkeeper if users don't want to skip the non-recoverable data. 
   
   I think treating the `TimeoutException` as a recoverable error is simple here. We are not able to load the data from the bookkeeper now. Just let the topic load get failed. The client side will trigger the topic loading again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org