You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/12/09 14:18:10 UTC
[GitHub] [pulsar] YanshuoH opened a new issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage
YanshuoH opened a new issue #13219:
URL: https://github.com/apache/pulsar/issues/13219
Hi,
**Describe the bug**
Got some `BKNoSuchLedgerExistsException` from time to time during offload (to AWS S3).
I've extracted some representative logs to illustrate the timeline.
Firstly, got `Closing ledger 33033 for being full` and `Preparing metadata to offload ledger 33033`, see details below:
<details>
```
13:09:34.216 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - [public/fluentbit-logging/persistent/app-devices] Closing ledger 33033 for being full
13:09:34.219 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Going to automatically offload ledgers [33033], total size = 2147504951, already offloaded = 0, to offload = 2147504951
13:09:34.219 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Preparing metadata to offload ledger 33033 with uuid 98398846-5454-44a1-9327-58d38316137e
13:09:34.223 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x17ac2600f8400d6 local:/10.120.155.130:35120 remoteserver:zookeeper.pulsar/172.20.51.113:2181 lastZxid:8593149500 xid:428967 sent:428967 recv:441235 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/managed-ledgers/public/fluentbit-logging/persistent/app-devices
13:09:34.223 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Metadata prepared for offload of ledger 33033 with uuid 98398846-5454-44a1-9327-58d38316137e
13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Creating a new ledger
13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 102, 108, 117, 101, 110, 116, 98, 105, 116, 45, 108, 111, 103, 103, 105, 110, 103, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 97, 112, 112, 45, 100, 101, 118, 105, 99, 101, 115], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [bookie-3.bookkeeper.pulsar.svc.cluster.local:3181, bookie-0.bookkeeper.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []
13:09:34.374 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [bookie-3.bookkeeper.pulsar.svc.cluster.local:3181, bookie-0.bookkeeper.pulsar.svc.cluster.local:3181] for ledger: 33050
13:09:34.374 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Created new ledger 33050
13:09:34.376 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x17ac2600f8400d6 local:/10.120.155.130:35120 remoteserver:zookeeper.pulsar/172.20.51.113:2181 lastZxid:8593149504 xid:428969 sent:428969 recv:441238 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/managed-ledgers/public/fluentbit-logging/persistent/app-devices
13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Failed to mark delete while trimming data ledgers: Invalid mark deleted position
13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Failed to mark delete while trimming data ledgers: Invalid mark deleted position
13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Ledger 33033 contains the current last confirmed entry 33033:29453, and it is going to be deleted
13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] End TrimConsumedLedgers. ledgers=1 totalSize=0
13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Removing ledger 33033 - size: 2147504951
13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Cleanup offload for ledgerId 33033 uuid 98398846-5454-44a1-9327-58d38316137e because of the reason Trimming.
```
</details>
Then about 4 minutes later, got `EntryLocationIndex - Deleting indexes for ledgers: [33033]` logs on the bookie side.
Then about 1 sec, got the `BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies`:
<details>
```
13:13:46.261 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L33033 E20216-E20315, Sent to [bookie-5.bookkeeper.pulsar.svc.cluster.local:3181, bookie-3.bookkeeper.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (20217, rc = 0)
13:13:46.261 [offloader-OrderedScheduler-1-0] ERROR org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl - Exception when get CompletableFuture<LedgerEntries>.
java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_282]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_282]
at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.readNextEntriesFromLedger(BlockAwareSegmentInputStreamImpl.java:117) ~[?:?]
at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.readEntries(BlockAwareSegmentInputStreamImpl.java:89) ~[?:?]
at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.read(BlockAwareSegmentInputStreamImpl.java:157) ~[?:?]
...
...
13:13:46.602 [offloader-OrderedScheduler-1-0] WARN org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Exception occurred during offload
...
...
```
</details>
As result, the ledger 33033 cannot be found in S3.
**To Reproduce**
Sadly I tried to accurately reproduce it in the test environement but nothing comes out yet.
**Expected behavior**
The ledger is offloaded to S3 and no exception raised.
So here is rather **a question**:
As we can see in the log, it says `Ledger 33033 contains the current last confirmed entry 33033:29453, and it is going to be deleted`. Shall we confirm that the missing offloaded ledger of 33033 is intended as shall not be considered as a bug ?
**Additional context**
Using pulsar 2.7.3, deployed in Kubernetes.
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hangc0276 closed issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage
Posted by GitBox <gi...@apache.org>.
hangc0276 closed issue #13219:
URL: https://github.com/apache/pulsar/issues/13219
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hangc0276 commented on issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage
Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #13219:
URL: https://github.com/apache/pulsar/issues/13219#issuecomment-990795572
The ledger 33033 has been deleted by retention during offloading, so it is the expected behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org