You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/12/09 14:18:10 UTC

[GitHub] [pulsar] YanshuoH opened a new issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage

YanshuoH opened a new issue #13219:
URL: https://github.com/apache/pulsar/issues/13219


   Hi,
   
   **Describe the bug**
   
   Got some `BKNoSuchLedgerExistsException` from time to time during offload (to AWS S3).
   
   I've extracted some representative logs to illustrate the timeline.
   
   Firstly, got `Closing ledger 33033 for being full` and `Preparing metadata to offload ledger 33033`, see details below:
   
   <details>
   
   ```
   13:09:34.216 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.OpAddEntry - [public/fluentbit-logging/persistent/app-devices] Closing ledger 33033 for being full
   13:09:34.219 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Going to automatically offload ledgers [33033], total size = 2147504951, already offloaded = 0, to offload = 2147504951
   13:09:34.219 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Preparing metadata to offload ledger 33033 with uuid 98398846-5454-44a1-9327-58d38316137e
   13:09:34.223 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x17ac2600f8400d6 local:/10.120.155.130:35120 remoteserver:zookeeper.pulsar/172.20.51.113:2181 lastZxid:8593149500 xid:428967 sent:428967 recv:441235 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/managed-ledgers/public/fluentbit-logging/persistent/app-devices
   13:09:34.223 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Metadata prepared for offload of ledger 33033 with uuid 98398846-5454-44a1-9327-58d38316137e
   13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Creating a new ledger
   13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 102, 108, 117, 101, 110, 116, 98, 105, 116, 45, 108, 111, 103, 103, 105, 110, 103, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 97, 112, 112, 45, 100, 101, 118, 105, 99, 101, 115], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
   13:09:34.368 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [bookie-3.bookkeeper.pulsar.svc.cluster.local:3181, bookie-0.bookkeeper.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []
   13:09:34.374 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [bookie-3.bookkeeper.pulsar.svc.cluster.local:3181, bookie-0.bookkeeper.pulsar.svc.cluster.local:3181] for ledger: 33050
   13:09:34.374 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Created new ledger 33050
   13:09:34.376 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x17ac2600f8400d6 local:/10.120.155.130:35120 remoteserver:zookeeper.pulsar/172.20.51.113:2181 lastZxid:8593149504 xid:428969 sent:428969 recv:441238 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/managed-ledgers/public/fluentbit-logging/persistent/app-devices
   13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Failed to mark delete while trimming data ledgers: Invalid mark deleted position
   13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Failed to mark delete while trimming data ledgers: Invalid mark deleted position
   13:09:34.377 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Ledger 33033 contains the current last confirmed entry 33033:29453, and it is going to be deleted
   13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] End TrimConsumedLedgers. ledgers=1 totalSize=0
   13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Removing ledger 33033 - size: 2147504951
   13:09:34.380 [bookkeeper-ml-workers-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Cleanup offload for ledgerId 33033 uuid 98398846-5454-44a1-9327-58d38316137e because of the reason Trimming.
   ```
   
   </details>
   
   Then about 4 minutes later, got `EntryLocationIndex - Deleting indexes for ledgers: [33033]` logs on the bookie side.
   
   Then about 1 sec, got the `BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies`:
   
   <details>
   
   ```
   13:13:46.261 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L33033 E20216-E20315, Sent to [bookie-5.bookkeeper.pulsar.svc.cluster.local:3181, bookie-3.bookkeeper.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (20217, rc = 0)
   13:13:46.261 [offloader-OrderedScheduler-1-0] ERROR org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl - Exception when get CompletableFuture<LedgerEntries>. 
   java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
     at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_282]
     at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_282]
     at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.readNextEntriesFromLedger(BlockAwareSegmentInputStreamImpl.java:117) ~[?:?]
     at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.readEntries(BlockAwareSegmentInputStreamImpl.java:89) ~[?:?]
     at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlockAwareSegmentInputStreamImpl.read(BlockAwareSegmentInputStreamImpl.java:157) ~[?:?]
   ...
   ...
   13:13:46.602 [offloader-OrderedScheduler-1-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/fluentbit-logging/persistent/app-devices] Exception occurred during offload
   ...
   ...
   ```
   
   </details>
   
   As result, the ledger 33033 cannot be found in S3. 
   
   **To Reproduce**
   
   Sadly I tried to accurately reproduce it in the test environement but nothing comes out yet.
   
   **Expected behavior**
   
   The ledger is offloaded to S3 and no exception raised.
   
   So here is rather **a question**:
   
   As we can see in the log, it says `Ledger 33033 contains the current last confirmed entry 33033:29453, and it is going to be deleted`.  Shall we confirm that the missing offloaded ledger of 33033 is intended as shall not be considered as a bug ?
   
   **Additional context**
   
   Using pulsar 2.7.3, deployed in Kubernetes.
   
   Thanks.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 closed issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage

Posted by GitBox <gi...@apache.org>.
hangc0276 closed issue #13219:
URL: https://github.com/apache/pulsar/issues/13219


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on issue #13219: BKNoSuchLedgerExistsException occurred during offload and the ledger is missing in tiered storage

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #13219:
URL: https://github.com/apache/pulsar/issues/13219#issuecomment-990795572


   The ledger 33033 has been deleted by retention during offloading, so it is the expected behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org