You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/09 02:18:11 UTC

[GitHub] [pulsar] TakaHiR07 opened a new issue, #15992: [AutoRecovery] keep rereplicate a ledger which is deleted

TakaHiR07 opened a new issue, #15992:
URL: https://github.com/apache/pulsar/issues/15992

   **BUG REPORT**
   
   ***Describe the bug***
   
   Our Production pulsar cluster is multiple nodes with  E-Qw-Qa(3-3-2), enabling auto-recovery by "./bin/bookkeeper shell autorecovery -enable", bookkeeper version is 4.14.1 . Now one bookie server is down, and cluster do autoRecovery. However, there is a ledger can not read by the other 2 ensemble, the error is both : Ledger 1294 not found (It seems the ledger has been deleted)
   
   
   
   ```
   [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while performing readLac from ledger: 1294
   org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 1294 not found
           at org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
           at org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
   ```
   
   
   But the ReplicationWorker still continue to try to rereplicate this ledger, and keep failed. According to the following log, it throw BKNotEnoughBookiesException, therefore ReplicationWorker#run would keep running, keep replicate a can-not-replicated ledger. The result is generating too much recovery read request to the other 2 ensemble bookie, affect the normal read request.
   
   ```
   [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 1294 did not hear success 
   responses from all of ensemble
   [ReplicationWorker] INFO  org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while rereplicating ledger 1294. Enough Bookies might not have available So, no harm to continue
   ```
   
   ```
   [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - IOException while trying to read last entry: 1294
   org.apache.bookkeeper.bookie.Bookie$NoEntryException: Entry -1 not found in 1294
   ```
   
   
   The zkmetadata has ledger 1294 under /ledgers/underreplication/ledgers
   
   ![企业微信截图_40feaa39-c1d5-4ce1-ae37-6a7f1eadd339](https://user-images.githubusercontent.com/13505225/172157384-2bb225a8-c924-47d4-9d02-aa6f7046f4d7.png)
   
   
   
   
   ***Expected behavior***
   
   It should skip those deleted ledger when doing recovery
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1213633707

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dlg99 commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted

Posted by GitBox <gi...@apache.org>.
dlg99 commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1182230169

   AutoRecovery (ReplicationWorker) actually handles case of deleted ledger
   ```
           } catch (BKNoSuchLedgerExistsOnMetadataServerException e) {
               // Ledger might have been deleted by user
               LOG.info("BKNoSuchLedgerExistsOnMetadataServerException while opening "
                   + "ledger {} for replication. Other clients "
                   + "might have deleted the ledger. "
                   + "So, no harm to continue", ledgerIdToReplicate);
               underreplicationManager.markLedgerReplicated(ledgerIdToReplicate);
               getExceptionCounter("BKNoSuchLedgerExistsOnMetadataServerException").inc();
               return false;
           } catch (BKNotEnoughBookiesException e) {
               logBKExceptionAndReleaseLedger(e, ledgerIdToReplicate);
               throw e;
           } catch (BKException e) {
               logBKExceptionAndReleaseLedger(e, ledgerIdToReplicate);
               return false;
               ```
               
   is it possible that ledger isn't deleted from metadata (exists in zk: ls /ledgers/000/..)?
   IIRC ReadLac saying that no ledger exists means that bookie that got request could not find the ledger data in local index/storage, which could mean that e.g. ledger has no data (ok), data is lost, metadata is corrupt and points to the wrong bookie, or something else. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1179639106

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org