You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/12/31 08:02:02 UTC

[GitHub] [pulsar] wolfstudy commented on issue #18128: [PIP-219] Support full scan and trim ledger

wolfstudy commented on issue #18128:
URL: https://github.com/apache/pulsar/issues/18128#issuecomment-1368181773

   > ### Motivation
   > Broker uses the `Trimledgers` thread to clean up outdated ledgers. During cleaning, each Broker traverses the topic metadata in memory to find the ledger that reach the retention or TTL threshold. However, there are some problems with this approach. When a topic has no producer and consumer, Broker deletes the metadata of topic from memory. As a result, ledgers of these topics can never be deleted. Therefore, we need a way to scan and clean all outdated ledgers .
   > 
   > ### Goal
   > The full scan will cause a large number of requests to the ZooKeeper. Therefore, the existing cleanup mode will be retained and a full scan mode will be added.
   > 
   > ### API Changes
   > 1. Add a new scheduling thread pool
   > 2. Add the following configuration item:
   >    // Full scan interval. This parameter is enabled only when the value > 0.
   >    fullScanTrimLedgerInterval=0
   >    // Maximum number of Metadata requests per second during scanning
   >    fullScanMaximumMetadataConcurrencyPerSecond=200
   > 3. Add a `TrimLedger` admin API.
   > 
   > ### Implementation
   > 1. Only the Leader Broker performs full scan.
   > 2. Leader Broker traverse `managedLedger` in each namespace from meta store . Since Ledger metadata contains the creation time. If the creation time is greater than the retention time + TTL time or size, Ledger should be deleted.
   >    Only the metadata of Ledger is parsed instead of loading all topics to the memory.
   >    The metadata request frequency is limited using semaphore.
   > 3. When a topic that meets the conditions, the Leader triggers the `TrimLedger` admin API. Since the admin API will verify the attribution of the topic and redirect it, the topic will be loaded by the corresponding Broker.
   >    After cleaning is done, the corresponding Broker closes the topic to release memory(Before closing, it will check whether the topic has producers and consumers, if not, close it).
   > 
   > ### Alternatives
   > _No response_
   > 
   > ### Anything else?
   > _No response_
   
   Nice work, we have encountered the same problem, detailed information can refer to: https://github.com/apache/pulsar/issues/19077
   
   The current Ledger cleaning mechanism of Pulsar Broker is indeed not rigorous enough, which will cause a lot of dirty data to be missed in Bookie's EntryLog


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org