You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/06/11 14:28:37 UTC

[GitHub] [pulsar] jujugrrr opened a new issue #7246: Can't read messages offloaded to S3

jujugrrr opened a new issue #7246:
URL: https://github.com/apache/pulsar/issues/7246


   **Describe the bug**
   Once ledgers are offloaded and removed from local storage, a reader is not able to retrieve the messages.
   
   **To Reproduce**
   Produce messages
   Read messages from beginning (SUCCESS)
   Rollover Ledgers
   Offload
   Wait for the offload deletion lag
   Read messages from beginning (FAIL - Timeout)
   
   **Expected behavior**
   
   I should still be able to retrieve messages from the beginning of my topic if they are offloaded to S3
   
   **Additional context**
   Pulsar on AWS EKS, with the latest helm chart.
   
   **More details**
   
   I'm testing pulsar offloading to S3. I have a script producing 1M messages and another one reading them. The reading works well a few times (I re-run from scratch) but then I start to get exceptions:
   
   ```
   10:28:56.701 [pulsar-io-24-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [ten/ns/persistent/my-topic-reader-3558c16521] Rewind from 233:0 to 233:0
   10:28:56.701 [pulsar-io-24-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ten/ns/my-topic] There are no replicated subscriptions on the topic
   10:28:56.701 [pulsar-io-24-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ten/ns/my-topic][reader-3558c16521] Created new subscription for 0
   10:28:56.701 [pulsar-io-24-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/x.x.x.x7:53908] Created subscription on topic persistent://ten/ns/my-topic / reader-3558c16521
   10:28:56.705 [bookkeeper-ml-workers-OrderedExecutor-6-0] WARN  org.apache.bookkeeper.mledger.impl.OpReadEntry - [ten/ns/persistent/my-topic][reader-3558c16521] read failed from ledger at position:233:0 : Unknown exception
   10:28:56.705 [broker-topic-workers-OrderedScheduler-3-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer - [persistent://ten/ns/my-topic / reader-3558c16521-Consumer{subscription=PersistentSubscription{topic=persistent://ten/ns/my-topic, name=reader-3558c16521}, consumerId=0, consumerName=, address=/x.x.x.x7:53908}] Error reading entries at 233:0 : Unknown exception - Retrying to read in 15.0 seconds
   ```
   
   Those ledgers are getting offloaded to S3. It looks like as soon as the ledger is removed(set-offload-deletion-lag) from the local storage I'm getting the exception below.
   
   ```
   10:28:28.452 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ten/ns/persistent/my-topic] End TrimConsumedLedgers. ledgers=3 totalSize=38942923
   10:28:28.452 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ten/ns/persistent/my-topic] Deleting offloaded ledger 233 from bookkeeper - size: 15432415
   10:28:28.452 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ten/ns/persistent/my-topic] Deleting offloaded ledger 234 from bookkeeper - size: 15504438
   - size: 16168356
   ```
   
   Also I can see the Ledgers got removed from Zookeeper. Is there a configuration option I'm missing?  Is there a way to understand why:
   
   ```
   read failed from ledger at position:233:0 : Unknown exception
   ```
    is happening? Thank you!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jujugrrr commented on issue #7246: Can't read messages offloaded to S3

Posted by GitBox <gi...@apache.org>.
jujugrrr commented on issue #7246:
URL: https://github.com/apache/pulsar/issues/7246#issuecomment-643172310


   @sijie  I set everything to unlimited:
   
   ```
   $ pulsar-admin namespaces get-retention ten/ns
   {
     "retentionTimeInMinutes" : -1,
     "retentionSizeInMB" : -1
   }
   
   $ pulsar-admin namespaces get-offload-threshold ten/ns
   0
   $ pulsar-admin namespaces get-offload-deletion-lag ten/ns
   1 minute(s)
   ```
   
   Broker conf:
   
   ```
       managedLedgerMinLedgerRolloverTimeMinutes: "1"
       managedLedgerMaxLedgerRolloverTimeMinutes: "2" # We want to rollover quickly so we can offload to S3
       managedLedgerMaxEntriesPerLedger: "5000"
   ```
   
   My focus is on having as much data as possible in Tiered storage for now, I'll adjust based on the performance impact. Having those low values speed up my test process as well.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jujugrrr commented on issue #7246: Can't read messages offloaded to S3

Posted by GitBox <gi...@apache.org>.
jujugrrr commented on issue #7246:
URL: https://github.com/apache/pulsar/issues/7246#issuecomment-644618453


   @sijie Do you have any idea? I compare with someone else configuration and changed a bit retention without luck. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jujugrrr commented on issue #7246: Can't read messages offloaded to S3

Posted by GitBox <gi...@apache.org>.
jujugrrr commented on issue #7246:
URL: https://github.com/apache/pulsar/issues/7246#issuecomment-645409625


   I've upgraded to 2.5.1 and rebuilt a cluster. It's all working as expected now. Not sure if it was a 2.5.0 bug or if it's something I broke while testing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jujugrrr closed issue #7246: Can't read messages offloaded to S3

Posted by GitBox <gi...@apache.org>.
jujugrrr closed issue #7246:
URL: https://github.com/apache/pulsar/issues/7246


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7246: Can't read messages offloaded to S3

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7246:
URL: https://github.com/apache/pulsar/issues/7246#issuecomment-643020356


   @jujugrrr did you enable any retention policy for your namespace?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org