You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/08/28 16:00:48 UTC

[GitHub] [bookkeeper] RaulGracia opened a new pull request #2779: Fix entrylog gc entrylogperledgerenabled

RaulGracia opened a new pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779

### Motivation

Fix Bookkeeper entry log GC when entryLogPerLedgerEnabled is activated, so we prevent Bookies to run out of ledger storage capacity.

### Changes

The objective of this PR is lo leave the GC algorithm in the default case untouched (`entryLogPerLedgerEnabled=false`), and fix the GC when `entryLogPerLedgerEnabled=true`. By "fix", we mean that the GC process can garbage collect all the available rotated entry log files that belong to ledgers that have been deleted already.

_Context_: By default (`entryLogPerLedgerEnabled=false`), meaning that entry log files may contain entries from multiple ledgers. In the normal case, entries are being added to the current entry log file and when it reaches a size limit, it is flushed and rotated so a new entry log file is created to store the next entries. In this sense, the logic for creating entry logs assumes that the initial id is 0 and each new entry log file will have an id which is +1 compared to the previous entry log id. This logic entails that all entry log files with an id lower than the current one have been already flushed and rotated, so they can be safely considered as candidates for garbage collection in case that they are not related to any active ledger.

_Problem_: The problem appears when `entryLogPerLedgerEnabled=true`, as it breaks some of the assumptions of the whole approach to create entry logs and garbage collect them. With this option, each entry log file will be related exclusively to one ledger id. While the entry log id scheme remains intact (i.e., ids of entry log files are produced sequentially starting from 0), now the assumption of entry logs being rotated linearly does not hold anymore. We can have a series of flushed and unflushed entry logs due to the differences in write throughput of the ledgers. However, the GC algorithm will only scan the metadata of entry logs that are below the `getLeastUnflushedEntryLog()`: https://github.com/apache/bookkeeper/blob/c8d4086697f85c9093f4da2c907a13e17c198914/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java#L589

Therefore, if one entry log file takes a long time to be flushed and rotated, it will be prevent the GC algorithm to garbage collect entry log ids beyond that id. If entry logs are getting produced at high rate beyond the `getLeastUnflushedEntryLog()` the Bookie may run out of ledger storage. This happens even though most of the entry logs beyond `getLeastUnflushedEntryLog()` can be candidates for garbage collection.

_Solution_: The current proposal tries to deal with the `entryLogPerLedgerEnabled=true` case, while leaving as is the default case ( `entryLogPerLedgerEnabled=false`). The changes in this sense, are the following:
- When scanning entry log files, if `entryLogPerLedgerEnabled=true` we will attempt to scan entry log files up to the highest id. That is, the algorithm will go beyond the `getLeastUnflushedEntryLog()`.
- It is important to do not consider for garbage collection the entry log files which have not been flushed and rotated. The reason is that there may be data in memory of entry logs that is not synced to the drive. In this case, if by mistake we consider an unflushed entry log file while scanning the metadata of an entry log, it could be mistakenly garbage collected because its metadata in storage is not including yet the associated ledgers that are in memory. For this reason, we only consider entry logs in `RecentEntryLogsStatus` that meet the following condition: `entryLogsStatusMap.containsKey(entryLogId) && entryLogsStatusMap.get(entryLogId) || entryLogId < leastUnflushedLogId`. Note that `entryLogsStatusMap` is an in-memory map that may be lost after a Bookie restart. The assumption is that after a Bookie restart, the ledger metadata of all entry logs is restored from the journal, so it contains the last state they had in memory before the crash.
- As we are skipping to scan entry log files that are not flushed and rotated, the scan of entry logs for garbage collection should start from the first unflushed entry log found in the last iteration, to check whether that first unflushed entry log is now a valid candidate for garbage collection.

### How to verify it
At the moment, we have done a 2 day workload with `entryLogPerLedgerEnabled=true` writing 1GBps. The original issue (ledger storage getting full sporadically in some Bookies) has not been reproduced.
We are in the process of analyzing logs from these experiments and create more unit tests to validate corner cases of this change.

Master Issue: #2728

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org