You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/01/21 18:06:36 UTC

[GitHub] [pulsar] frankjkelly opened a new issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

frankjkelly opened a new issue #9266:
URL: https://github.com/apache/pulsar/issues/9266


   **Describe the bug**
   Retrieving data from Tiered Storage (S3) fails initially according to Broker logs ("unknown exception") then succeeds after 15 second retry resulting in a 15 second response time minimum.
   I can reproduce this consistently.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Configure Tiered storage offload to S3
   2. Upload data to a topic
   2. Wait for Managed Ledger Rollver
   3. Offload Topic
   4. Attempt to retrieve data
   
   **Expected behavior**
   I expect some delay but not an exception followed by a 15 second retry
   
   **Screenshots**
   Logs
   
   ```
    17:50:18.735 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31] [PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7] Updating cursor info ledgerId=1108490 mark-delete=1107259:-1
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31] Updated cursor PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7 with ledger id 1108490 md-position=1107259:-1 rd-position=1107259:0
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31] Opened new cursor: ManagedCursorImpl{ledger=cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31, name=PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7, ackPos=1107259:-1, readPos=1107259:0}
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31-PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7] Rewind from 1047055:0 to 1047055:0
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31] There are no replicated subscriptions on the topic
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31][PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7] Created new subscription for 0
   17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:43862] Created subscription on topic persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31 / PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7
   17:50:18.771 [bookkeeper-ml-workers-OrderedExecutor-5-0] WARN  org.apache.bookkeeper.mledger.impl.OpReadEntry - [cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31][PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7] read failed from ledger at position:1047055:0 : Unknown exception
   17:50:18.771 [broker-topic-workers-OrderedScheduler-4-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer - [persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31 / PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7-Consumer{subscription=PersistentSubscription{topic=persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31, name=PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7}, consumerId=0, consumerName=8f423, address=/127.0.0.1:43862}] Error reading entries at 1047055:0 : Unknown exception - Retrying to read in 15.0 seconds
   ```
   
   There was no stack trace for that error
   
   **Desktop (please complete the following information):**
    - OS: Linux
    - Pulsar: 2.6.1
   
   **Additional context**
   
   Tiered Offload Settings
   ```
   $ ./apache-pulsar-2.6.1/bin/pulsar-admin --admin-url http://platform-pulsar-broker:8080 namespaces get-offload-threshold cogito-dialog/wav
   0
   $ ./apache-pulsar-2.6.1/bin/pulsar-admin --admin-url http://platform-pulsar-broker:8080 namespaces get-offload-deletion-lag cogito-dialog/wav
   15 minute(s)
   ```
   Broker Overrides to defaults
   ```
           managedLedgerMaxLedgerRolloverTimeMinutes: "90"
   ```
   
   Other Broker settings
   ```
   # grep ffload broker.conf  | grep -v "#"
   managedLedgerOffloadDeletionLagMs=14400000
   managedLedgerOffloadAutoTriggerSizeThresholdBytes=-1
   offloadersDirectory=./offloaders
   managedLedgerOffloadDriver=aws-s3
   managedLedgerOffloadMaxThreads=2
   managedLedgerOffloadPrefetchRounds=1
   s3ManagedLedgerOffloadRegion=us-east-1
   s3ManagedLedgerOffloadBucket=eks-saas-pulsar-455275ef9748510192359f2f92675b2e
   s3ManagedLedgerOffloadServiceEndpoint=
   s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864
   s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576
   gcsManagedLedgerOffloadRegion=
   gcsManagedLedgerOffloadBucket=
   gcsManagedLedgerOffloadMaxBlockSizeInBytes=67108864
   gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576
   gcsManagedLedgerOffloadServiceAccountKeyFile=
   fileSystemProfilePath=../conf/filesystem_offload_core_site.xml
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-765441296


   Good idea - I looked through the broker conf files and could not find where this was specified


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-781607203


   @codelipenghui it appears now that there might be a plan for a 2.6.4 release (I noticed the release label on https://github.com/apache/pulsar/pull/7356). Any chance https://github.com/apache/pulsar/pull/9323 could be back-ported?
   Thanks in advance. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-1059155783


   No longer an issue for us - we wrote our own S3 offloader for various reasons.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-1058893998


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly closed issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly closed issue #9266:
URL: https://github.com/apache/pulsar/issues/9266


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-769087662


   Hi, @frankjkelly. Sorry, the next release version is 2.7.1 and we do not have a plan to release 2.6.x. If you want to apply this patch, we can help to cherry-pick this fix to the branch-2.6 and you can build it by yourself. 
   
   And the streamnative.io have a weekly release at https://github.com/streamnative/pulsar/releases, https://hub.docker.com/r/streamnative/pulsar/tags?page=1&ordering=last_updated. This release is based on the official branch. We can start a quick release for 2.6.3.x with this path well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-806636632


   FYI could be a dupe of https://github.com/apache/pulsar/issues/8282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-769079148


   Thanks indeed - we are using Pulsar 2.6.1 - is there any chance this could be back ported to the next maintenance release (2.6.4?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] apapia commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
apapia commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-765439737


   Is it possible to reduce the retry time (15s) as a workaround?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-768842245


   Thanks @Renkai 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Renkai commented on issue #9266: Retrieving data from Tiered Storage (S3) fails initially then succeeds after 15 second retry (Pulsar 2.6.1)

Posted by GitBox <gi...@apache.org>.
Renkai commented on issue #9266:
URL: https://github.com/apache/pulsar/issues/9266#issuecomment-767402695


   I have submitted a pull request to show more info about the "Unknown Exception", https://github.com/apache/pulsar/pull/9323
   It's better to decide how to deal with it(retry or do something extra) after we actually know what it is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org