You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/07/10 06:45:28 UTC

[GitHub] [pulsar] hozumi opened a new issue #7500: What is expected behavoir when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

hozumi opened a new issue #7500:
URL: https://github.com/apache/pulsar/issues/7500


   I couldn't get clear understanding of pulsar behavior from documentation in the following situation.
   
   Supposing a namespace configured 100GB retention policy, 1GB backlog quotas and 100GB messages are pushed with no subscription, what will happen if a subscription with `SubscriptionInitialPosition.Earliest` setting is created?
   Is the backlog of the newly created subscription over 1GB backlog quotas?
   Will a producer immediately stop because of `producer_request_hold` policy?
   
   I originally discussed about this question on slack and was recommended to rise issue in order to get clarification.
   https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1594348193122700


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-659951570


   @hozumi 
   
   > should backlog quota be set to the same size as retention policy? 
   
   Yes.
   
   > If so, as retention policy is applied to the messages only not in backlog, do I need to estimate the maximum size of disk space at twice the size of the retention policy?
   
   If your consumer acknowledge messages, you can just estimate the maximum size of disk space by write_quorum_size * the size of the retention policy.
   
   You can also set TTL to be the same as the retention policy. So if you have subscriptions forgot to acknowledge, the data will still eventually be cleaned up after retention period. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-667297329


   @hozumi you are correct in theory. However, currently the implementation is done by treating the largest backlog as the topic backlog. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-659914269


   It seems lots of confusion around these concepts and no clear documentation is available anywhere. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-667324457


   I see.
   I'm also interested in how retention policy and backlog quotas affect compactioned topic. I think there is not much information on this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-657975830


   @sijie Thank you for your answer.
   That is to say, in order for lately created subscription to read all retained messages without interfering a producer, should backlog quota be set to the same size as retention policy? (In this case, 100GB.)
   If so, as retention policy is applied to the messages only not in backlog, do I need to estimate the maximum size of disk space at twice the size of the retention policy?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660278893


   Let me know if this understanding is correct.  The `backlogQuotaDefaultLimitGB` is set per topic and is a logical unit within persistent storage. Pulsar applies this limit to the largest subscription backlog for the topic.  
   
   There is no physical storage for this topic backlog quota, but it's a logical threshold limit and this basically governs how much max unacknowledge message any subscription can have.  
   
   And once this topic backlog quota is reached, we can choose to apply one of `producer_request_hold`,`producer_exception`,`consumer_backlog_eviction` policy. Where  `consumer_backlog_eviction` marks the older message in the backlog as acknowledged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660086383


   > but all messages including acknowledged and unacknowledged are included to calculate the size limit of the retention policy?
   
   I assume retention policy size limits should not apply to unacknowledged messages. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-667233606


   @sijie That is nice! I appreciate your help.
   One thing I noticed is that you described topic backlog as the backlog of the slowest subscription.
   I think this might be still inaccurate because cursor == offset + individual deletes.
   To know all unacknowledged messages, in addition to the offset of the slowest subscription, we need to merge individual deletes of all subscription.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth edited a comment on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth edited a comment on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-659962468


   @hozumi  @sijie  - if `SubscriptionInitialPosition.Earliest` creates a subscription from the earliest message in `Retention` storage. How can we create a subscription with the earliest position in topic backlogQuota/ earliest unacknowledged message? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-659962468


   @hozumi  @sijie  - if `SubscriptionInitialPosition.Earliest` creates a subscription from the earliest message in `Retention` storage. How can we create a subscription with earliest position in backlogQuota? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi closed issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi closed issue #7500:
URL: https://github.com/apache/pulsar/issues/7500


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660079467


   > If your consumer acknowledge messages, you can just estimate the maximum size of disk space by write_quorum_size * the size of the retention policy.
   
   Thank you for your answer.
   
   https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/
   > The retention policy settings do not affect unacknowledged messages on topics with subscriptions
   
   Does the above document mean that while the messages in the consumer's backlog are not deleted by the retention policy settings, but all messages including acknowledged and unacknowledged are included to calculate the size limit of the retention policy?
   If so, it make sense. I can easily manage maximum disk space only by retention policy and write_quorum_size.
   I thought that the retention policy and backlog quota would control the size limit of acknowledged messages and unacknowledged messages independently.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-657912182


   > Is the backlog of the newly created subscription over 1GB backlog quotas?
   
   Yes. Because the backlog will exceed 1GB limitation.
   
   > Will a producer immediately stop because of producer_request_hold policy?
   
   Yes. That's correct.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660198086


   > if SubscriptionInitialPosition.Earliest creates a subscription from the earliest message in Retention storage. How can we create a subscription with the earliest position in topic backlogQuota/ earliest unacknowledged message?
   
   @rvashishth According to Addison Higham, the message in a topic itself does not have an acknowledged flag, and just each subscription have a backlog, which tracks unacknowledged messages. Therefore, the earliest position of unacknowledged messages varies from subscription to subscription.
   
   https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1594358745127000?thread_ts=1594348193.122700&cid=C5Z4T36F7
   > A subscription is basically a pointer into a topic, talking about acknowledged messages only really applies to a subscription, not the underlying topic
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660379186


   @rvashishth Your understanding is correct.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hozumi commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
hozumi commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-667083124


   https://youtu.be/PIX570nyq_c?t=738
   Thank you for describing backlog and retention clearly.
   
   I think that it is very helpful if the above explanation is included in document.
   
   Adding a note for my previous understanding.
   
   For later created subscription to consume all retained messages, the backlog quota should be set to more than current retained size.
   
   On maximum storage size estimation:
   
   > Does the above document mean that while the messages in the consumer's backlog are not deleted by the retention policy settings, but all messages including acknowledged and unacknowledged are included to calculate the size limit of the retention policy?
   > If so, it make sense. I can easily manage maximum disk space only by retention policy and write_quorum_size.
   
   This is wrong.
   
   ```
   #6 Backlog quota sets a CAP on unacked messages.
   #8 Retention Policy defines how to handle acked messages.
    Storage Size = Backlog Size + Retained Messages Size
   ```
   (from the video above)
   
   Considering unacknowledged messages,  the maximum storage size should be estimated by (the size of backlog quota + the size of the retention policy) * write_quorum_size, not simply by the size of the retention policy * write_quorum_size.
   (I ignored segment thing for simplicity. Actual storage size will be sum of underlying segment.)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rvashishth commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
rvashishth commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-660245009


   > I thought that the retention policy and backlog quota would control the size limit of acknowledged messages and unacknowledged messages independently.
   
   @hozumi I was thinking the same, this might be the reason for whole confusion.  I assume backlogQuota and subscription/message backlog are two different things, where message backlog is just a cursor and backlogQuota is actual storage of unacked message backlog. 
   
   https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#backlog-quotas
   > Backlogs are sets of unacknowledged messages for a topic that have been stored by bookies. Pulsar **stores all unacknowledged messages in backlogs** until they are processed and acknowledged.
   
   @addisonj you mentioned _Beyond subscriptions, topics themselves just have retention_. Is this backlogQuota per topic is just a pointer in size for all subscriptions, which mark unacked messages beyond the size threshold as acked? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7500: What is expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting?

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7500:
URL: https://github.com/apache/pulsar/issues/7500#issuecomment-667197899


   @hozumi Yes. we are going to convert the video into blog posts and eventually update to the doc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org