You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/02/28 15:11:08 UTC

[GitHub] [pulsar] aloyszhang opened a new issue #14495: [Feature] Support ManagedLedger EntryCache cache entires before the slowest cursor

aloyszhang opened a new issue #14495:
URL: https://github.com/apache/pulsar/issues/14495


   **Is your feature request related to a problem? Please describe.**
   A topic may have lots of subscriptions in some scenes like an advertising recommendation system. This kind of system may update very frequently and at which point all subscriptions need to seek back to a fixed time(one or two hours).
   
   We have a pulsar cluster for a recommendation business, every topic has 200~400 subscriptions, and all subscriptions need to seek back for 2 hours when the recommendation system restart(update or upgrade). And we want to catch up on all backlogs ASAP. 
   
   I.e. all subscriptions are consuming data in tailing read way in peacetime, and if the user's system restarts, all subscriptions will seek to 2 hours ago then read all backlogs and continue consuming new data. 
   
   But even after we turning both the `managedLedgerCacheEvictionTimeThresholdMillis` and `managedLedgerCursorBackloggedThreshold`  to a very large value, the EntryCache still missed after `consumer.seek()`.  This is caused by the `cacheEvictionTask`, which will always remove all entries already read by active cursors.
   
   ```java
   void doCacheEviction(long maxTimestamp) {
           // Always remove all entries already read by active cursors
           PositionImpl slowestReaderPos = getEarlierReadPositionForActiveCursors();
           if (slowestReaderPos != null) {
               entryCache.invalidateEntries(slowestReaderPos);
           }
   
           // Remove entries older than the cutoff threshold
           entryCache.invalidateEntriesBeforeTimestamp(maxTimestamp);
       }
   ```
   
   
   **Describe the solution you'd like**
   We can add a new configuration `managedLedgerCacheEvictionSkipSlowestCursor` to control whether the `cacheEvictionTask` will remove entries read by all active cursors.
   And by default, the `managedLedgerCacheEvictionSkipSlowestCursor` is false, which will keep compatible with logic before. But for scenes that need to read backlog from EntryCachel, the user can set `managedLedgerCacheEvictionSkipSlowestCursor`  to true which will cache all entries until  reach the `managedLedgerCacheEvictionTimeThresholdMillis`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] aloyszhang commented on issue #14495: [Feature] Support ManagedLedger EntryCache cache entires before the slowest cursor

Posted by GitBox <gi...@apache.org>.
aloyszhang commented on issue #14495:
URL: https://github.com/apache/pulsar/issues/14495#issuecomment-1056095051


   > The current solution looks like to provide a catchup cache by extending the tailing cache, but I think it will not work for the cases that have multiple cursors read from an earlier position such as 3 days but only have entry cache in 2 hours.
   
   Yes, it re-uses the entry cache and will fail if we read entry out of the range of the cached data.
   
   But if we support another catchup cache, the catchup-cache and tailing-cache may have an intersection, this may waste some memory.
   
   > It's better to start a discussion in the dev email thread or start with a proposal.
   
   Agree, we can discuss it first.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on issue #14495: [Feature] Support ManagedLedger EntryCache cache entires before the slowest cursor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #14495:
URL: https://github.com/apache/pulsar/issues/14495#issuecomment-1086484755


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #14495: [Feature] Support ManagedLedger EntryCache cache entires before the slowest cursor

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #14495:
URL: https://github.com/apache/pulsar/issues/14495#issuecomment-1055015211


   Looks like we want to add a catchup entry cache. The current entry cache is used for the tailing messages. The current solution looks like to provide a catchup cache by extending the tailing cache, but I think it will not work for the cases that have multiple cursors read from an earlier position such as 3 days but only have entry cache in 2 hours. Essentially, the issue is how to use the cache to speed up multiple subscription catchup reads.
   
   It's better to start a discussion in the dev email thread or start with a proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org