You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/07/14 08:04:43 UTC

[GitHub] [bookkeeper] wolfstudy opened a new issue, #3405: Ledger has been in IN_RECOVERY state and cannot be recovered

wolfstudy opened a new issue, #3405:
URL: https://github.com/apache/bookkeeper/issues/3405

   **BUG REPORT**
   
   ***Describe the bug***
   
   
   Initially, we found that a topic in Pulsar could not do normal message production and sending operations. Therefore, querying the stats and stats-internal of the current topic returns no results, as follows:
   
   Status:
   
   ```
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "bytesInCounter" : 0,
     "msgInCounter" : 0,
     "bytesOutCounter" : 0,
     "msgOutCounter" : 0,
     "averageMsgSize" : 0.0,
     "msgChunkPublished" : false,
     "storageSize" : 0,
     "backlogSize" : 0,
     "offloadedStorageSize" : 0,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : { },
     "nonContiguousDeletedMessagesRanges" : 0,
     "nonContiguousDeletedMessagesRangesSerializedSize" : 0,
     "publishRateLimitedTimes" : 0,
     "metadata" : {
       "partitions" : 1
     },
     "partitions" : { }
   }
   ```
   
   Status-internal:
   
   ```
   {
     "metadata" : {
       "partitions" : 0
     },
     "partitions" : { }
   }
   ```
   
   At this point, I went to query Bookie's log and found the following error message:
   
   ![image](https://user-images.githubusercontent.com/20965307/178931175-d21d9200-af4e-4df7-9b85-dbbb795ea9c5.png)
   
   
   And the source code is here: https://github.com/apache/pulsar/blob/6704f12104/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java#L366-L374
   
   Here we can see that the current Ledger state has not been restored within the time of the default configuration managedLedgerMetadataOperationsTimeoutSeconds=60s.
   
   So we tried to use the readledger command to find that the current Ledger stores nearly 700w of Entries, so when the Broker tries to load the topic, it will open the current ledger, but the Ledger is always in the state of IN_RECOVERY, So it will enter the following code logic:
   
   ```
   if (doRecovery) {
               lh.recover(new OrderedGenericCallback<Void>(bk.getMainWorkerPool(), ledgerId) {
                   @Override
                   public void safeOperationComplete(int rc, Void result) {
                       if (rc == BKException.Code.OK) {
                           openComplete(BKException.Code.OK, lh);
                       } else if (rc == BKException.Code.UnauthorizedAccessException) {
                           closeLedgerHandle();
                           openComplete(BKException.Code.UnauthorizedAccessException, null);
                       } else {
                           closeLedgerHandle();
                           openComplete(bk.getReturnRc(BKException.Code.LedgerRecoveryException), null);
                       }
                   }
                   @Override
                   public String toString() {
                       return String.format("Recover(%d)", ledgerId);
                   }
               });
           }
   ```
   
   However, since there are many Entry that need to be restored, we can see that the current Ledger has not been able to be restored, so the Topic on the Broker side cannot be loaded correctly, so the entire service has been unavailable.
   
   ***To Reproduce***
   
   I don't know how this phenomenon came about, and I don't know how to reproduce this scene
   
   ## Solution
   
   We locally mocked a bookie client object, and then re-fetched the properties of LedgerMetadata, forcing the current Ledger state to be reset from IN_RECOVERY to CLOSED state, and the problem recovered


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] eolivelli commented on issue #3405: Ledger has been in IN_RECOVERY state and cannot be recovered

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #3405:
URL: https://github.com/apache/bookkeeper/issues/3405#issuecomment-1184377321

   are you sure that this is a bookkeeper problem ?
   do you have any logs from the bookkeeper client ? "org.apache.bookkeeper" loggers (excluding Pulsar mledger logs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on issue #3405: Ledger has been in IN_RECOVERY state and cannot be recovered

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on issue #3405:
URL: https://github.com/apache/bookkeeper/issues/3405#issuecomment-1191750815

   I have a question what is ledger's config
   EnsembleSize=?
   WriteQuorum=?
   AckQuorum=?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org