You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/09 02:18:11 UTC
[GitHub] [pulsar] TakaHiR07 opened a new issue, #15992: [AutoRecovery] keep rereplicate a ledger which is deleted
TakaHiR07 opened a new issue, #15992:
URL: https://github.com/apache/pulsar/issues/15992
**BUG REPORT**
***Describe the bug***
Our Production pulsar cluster is multiple nodes with E-Qw-Qa(3-3-2), enabling auto-recovery by "./bin/bookkeeper shell autorecovery -enable", bookkeeper version is 4.14.1 . Now one bookie server is down, and cluster do autoRecovery. However, there is a ledger can not read by the other 2 ensemble, the error is both : Ledger 1294 not found (It seems the ledger has been deleted)
```
[BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while performing readLac from ledger: 1294
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 1294 not found
at org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
at org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
```
But the ReplicationWorker still continue to try to rereplicate this ledger, and keep failed. According to the following log, it throw BKNotEnoughBookiesException, therefore ReplicationWorker#run would keep running, keep replicate a can-not-replicated ledger. The result is generating too much recovery read request to the other 2 ensemble bookie, affect the normal read request.
```
[BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 1294 did not hear success
responses from all of ensemble
[ReplicationWorker] INFO org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while rereplicating ledger 1294. Enough Bookies might not have available So, no harm to continue
```
```
[BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - IOException while trying to read last entry: 1294
org.apache.bookkeeper.bookie.Bookie$NoEntryException: Entry -1 not found in 1294
```
The zkmetadata has ledger 1294 under /ledgers/underreplication/ledgers
![企业微信截图_40feaa39-c1d5-4ce1-ae37-6a7f1eadd339](https://user-images.githubusercontent.com/13505225/172157384-2bb225a8-c924-47d4-9d02-aa6f7046f4d7.png)
***Expected behavior***
It should skip those deleted ledger when doing recovery
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1213633707
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dlg99 commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted
Posted by GitBox <gi...@apache.org>.
dlg99 commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1182230169
AutoRecovery (ReplicationWorker) actually handles case of deleted ledger
```
} catch (BKNoSuchLedgerExistsOnMetadataServerException e) {
// Ledger might have been deleted by user
LOG.info("BKNoSuchLedgerExistsOnMetadataServerException while opening "
+ "ledger {} for replication. Other clients "
+ "might have deleted the ledger. "
+ "So, no harm to continue", ledgerIdToReplicate);
underreplicationManager.markLedgerReplicated(ledgerIdToReplicate);
getExceptionCounter("BKNoSuchLedgerExistsOnMetadataServerException").inc();
return false;
} catch (BKNotEnoughBookiesException e) {
logBKExceptionAndReleaseLedger(e, ledgerIdToReplicate);
throw e;
} catch (BKException e) {
logBKExceptionAndReleaseLedger(e, ledgerIdToReplicate);
return false;
```
is it possible that ledger isn't deleted from metadata (exists in zk: ls /ledgers/000/..)?
IIRC ReadLac saying that no ledger exists means that bookie that got request could not find the ledger data in local index/storage, which could mean that e.g. ledger has no data (ok), data is lost, metadata is corrupt and points to the wrong bookie, or something else.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on issue #15992: [AutoRecovery] keep rereplicate a ledger which is deleted
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15992:
URL: https://github.com/apache/pulsar/issues/15992#issuecomment-1179639106
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org