You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/08/28 16:00:48 UTC

[GitHub] [bookkeeper] RaulGracia opened a new pull request #2779: Fix entrylog gc entrylogperledgerenabled

RaulGracia opened a new pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779


   ### Motivation
   
   Fix Bookkeeper entry log GC when entryLogPerLedgerEnabled is activated, so we prevent Bookies to run out of ledger storage capacity.
   
   ### Changes
   
   The objective of this PR is lo leave the GC algorithm in the default case untouched (`entryLogPerLedgerEnabled=false`), and fix the GC when `entryLogPerLedgerEnabled=true`. By "fix", we mean that the GC process can garbage collect all the available rotated entry log files that belong to ledgers that have been deleted already.
   
   _Context_: By default (`entryLogPerLedgerEnabled=false`), meaning that entry log files may contain entries from multiple ledgers. In the normal case, entries are being added to the current entry log file and when it reaches a size limit, it is flushed and rotated so a new entry log file is created to store the next entries. In this sense, the logic for creating entry logs assumes that the initial id is 0 and each new entry log file will have an id which is +1 compared to the previous entry log id. This logic entails that all entry log files with an id lower than the current one have been already flushed and rotated, so they can be safely considered as candidates for garbage collection in case that they are not related to any active ledger.
   
   _Problem_: The problem appears when `entryLogPerLedgerEnabled=true`, as it breaks some of the assumptions of the whole approach to create entry logs and garbage collect them. With this option, each entry log file will be related exclusively to one ledger id. While the entry log id scheme remains intact (i.e., ids of entry log files are produced sequentially starting from 0), now the assumption of entry logs being rotated linearly does not hold anymore. We can have a series of flushed and unflushed entry logs due to the differences in write throughput of the ledgers. However, the GC algorithm will only scan the metadata of entry logs that are below the `getLeastUnflushedEntryLog()`: https://github.com/apache/bookkeeper/blob/c8d4086697f85c9093f4da2c907a13e17c198914/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java#L589
   
   Therefore, if one entry log file takes a long time to be flushed and rotated, it will be prevent the GC algorithm to garbage collect entry log ids beyond that id. If entry logs are getting produced at high rate beyond the `getLeastUnflushedEntryLog()` the Bookie may run out of ledger storage. This happens even though most of the entry logs beyond  `getLeastUnflushedEntryLog()` can be candidates for garbage collection.
   
   _Solution_: The current proposal tries to deal with the `entryLogPerLedgerEnabled=true` case, while leaving as is the default case ( `entryLogPerLedgerEnabled=false`). The changes in this sense, are the following:
   - When scanning entry log files, if `entryLogPerLedgerEnabled=true` we will attempt to scan entry log files up to the highest id. That is, the algorithm will go beyond the `getLeastUnflushedEntryLog()`.
   - It is important to do not consider for garbage collection the entry log files which have not been flushed and rotated. The reason is that there may be data in memory of entry logs that is not synced to the drive. In this case, if by mistake we consider an unflushed entry log file while scanning the metadata of an entry log, it could be mistakenly garbage collected because its metadata in storage is not including yet the associated ledgers that are in memory. For this reason, we only consider entry logs in `RecentEntryLogsStatus` that meet the following condition: `entryLogsStatusMap.containsKey(entryLogId) && entryLogsStatusMap.get(entryLogId) || entryLogId < leastUnflushedLogId`. Note that `entryLogsStatusMap` is an in-memory map that may be lost after a Bookie restart. The assumption is that after a Bookie restart, the ledger metadata of all entry logs is restored from the journal, so it contains the last state they had in memory before the crash.
   - As we are skipping to scan entry log files that are not flushed and rotated, the scan of entry logs for garbage collection should start from the first unflushed entry log found in the last iteration, to check whether that first unflushed entry log is now a valid candidate for garbage collection.
   
   ### How to verify it
   At the moment, we have done a 2 day workload with `entryLogPerLedgerEnabled=true` writing 1GBps. The original issue (ledger storage getting full sporadically in some Bookies) has not been reproduced.
   We are in the process of analyzing logs from these experiments and create more unit tests to validate corner cases of this change.
   
   Master Issue: #2728


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699442851



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       Thanks @Vanlightly, then I understand that this change is safe. 
   
   I also like the considerations regarding the refactoring, we have been also discussing this with @fpj. But I wonder if it would be better to first have `entryLogPerLedgerEnabled` option working correctly (goal of this PR), and think in parallel about the refactoring of GC logic and ledger storage separately, as it appears to be a larger change. Does it sound reasonable? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli merged pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] Vanlightly commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699290224



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       When the journal is replayed it writes the entries in the same way as an add entry request received from a client.
   
   I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations. Different storage engines work in different ways and we're trying to make each conform to a standard set of interfaces/behaviours. ELPL would then be one more ledger storage engine that is free to do GC/compaction in any way it likes without worrying about impacting other storage engines or conforming to a common interface.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] Vanlightly commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699290224



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       When the journal is replayed it writes the entries in the same way as an add entry request received from a client.
   
   I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations. Different storage engines work in different ways and we're trying to make each conform to a standard set of interfaces/behaviours. ELPL would then be one more ledger storage engine that is free to do GC/compaction in any way it likes without worrying about impacting other storage engines or conforming to a common interface.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699167387



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       Sure, I can do that, no worries. 
   
   But I have a more fundamental question about this change that I would like to hear from (all of) you. I will try to summarize it:
   - By default, entry logs are all flushed with the exception of the current one.
   - With `entryLogPerLedgerEnabled`, there could be multiple unflushed entry logs across the id space.
   
   Now, let's imagine that the Bookie restarts and there is unflushed data/metadata of entry logs. In the default case, this situation would impact only on the last entry log. In the case of `entryLogPerLedgerEnabled`, there could be multiple entry logs impacted (i.e., data was in memory but not synced to disk).
   
   The question is: when a Bookie starts and replays the journal, would it take care of all the unflushed entry log data irrespective of whether it belongs to the last entry log or multiple entry logs? 
   
   My understanding is that the replay of the journal takes care of all the unflushed data of all entry logs (but I would like to confirm this with you). If this is true, then I think that this PR is safe. In addition, our experiments inducing multiple Bookie restarts in purpose show that no unflushed entry log was wrongly garbage collected after the Bookie was up again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699153213



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       thinking out loud:
   is it possible to add some method in `entryLogger` (here and below) instead of referring to the BookieConfiguration ?
   I find adding these `conf.isEntryLogPerLedgerEnabled()` quite difficult to maintain in the future.
   Probably we can keep inside EntryLogger the fact that we have one EntryLogger per ledger.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699435030



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       > I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations
   that is a good idea

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       > I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations
   
   this is a good idea (for the future, not for this patch I mean)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699153213



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       thinking out loud:
   is it possible to add some method in `entryLogger` (here and below) instead of referring to the BookieConfiguration ?
   I find adding these `conf.isEntryLogPerLedgerEnabled()` quite difficult to maintain in the future.
   Probably we can keep inside EntryLogger the fact that we have one EntryLogger per ledger.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699167387



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       Sure, I can do that, no worries. 
   
   But I have a more fundamental question about this change that I would like to hear from (all of) you. I will try to summarize it:
   - By default, entry logs are all flushed with the exception of the current one.
   - With `entryLogPerLedgerEnabled`, there could be multiple unflushed entry logs across the id space.
   
   Now, let's imagine that the Bookie restarts and there is unflushed data/metadata of entry logs. In the default case, this situation would impact only on the last entry log. In the case of `entryLogPerLedgerEnabled`, there could be multiple entry logs impacted (i.e., data was in memory but not synced to disk).
   
   The question is: when a Bookie starts and replays the journal, would it take care of all the unflushed entry log data irrespective of whether it belongs to the last entry log or multiple entry logs? 
   
   My understanding is that the replay of the journal takes care of all the unflushed data of all entry logs (but I would like to confirm this with you). If this is true, then I think that this PR is safe. In addition, our experiments inducing multiple Bookie restarts in purpose show that no unflushed entry log was wrongly garbage collected after the Bookie was up again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699153213



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       thinking out loud:
   is it possible to add some method in `entryLogger` (here and below) instead of referring to the BookieConfiguration ?
   I find adding these `conf.isEntryLogPerLedgerEnabled()` quite difficult to maintain in the future.
   Probably we can keep inside EntryLogger the fact that we have one EntryLogger per ledger.

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       > I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations
   that is a good idea

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       > I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations
   
   this is a good idea (for the future, not for this patch I mean)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] Vanlightly commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699290224



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       When the journal is replayed it writes the entries in the same way as an add entry request received from a client.
   
   I think at some point it might be a good idea to hide the GC and compaction code entirely inside the ledger storage implementations. Different storage engines work in different ways and we're trying to make each conform to a standard set of interfaces/behaviours. ELPL would then be one more ledger storage engine that is free to do GC/compaction in any way it likes without worrying about impacting other storage engines or conforming to a common interface.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2779: Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2779:
URL: https://github.com/apache/bookkeeper/pull/2779#discussion_r699167387



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       Sure, I can do that, no worries. 
   
   But I have a more fundamental question about this change that I would like to hear from (all of) you. I will try to summarize it:
   - By default, entry logs are all flushed with the exception of the current one.
   - With `entryLogPerLedgerEnabled`, there could be multiple unflushed entry logs across the id space.
   
   Now, let's imagine that the Bookie restarts and there is unflushed data/metadata of entry logs. In the default case, this situation would impact only on the last entry log. In the case of `entryLogPerLedgerEnabled`, there could be multiple entry logs impacted (i.e., data was in memory but not synced to disk).
   
   The question is: when a Bookie starts and replays the journal, would it take care of all the unflushed entry log data irrespective of whether it belongs to the last entry log or multiple entry logs? 
   
   My understanding is that the replay of the journal takes care of all the unflushed data of all entry logs (but I would like to confirm this with you). If this is true, then I think that this PR is safe. In addition, our experiments inducing multiple Bookie restarts in purpose show that no unflushed entry log was wrongly garbage collected after the Bookie was up again.

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
##########
@@ -583,12 +584,15 @@ protected void compactEntryLog(EntryLogMetadata entryLogMeta) {
      * @throws IOException
      */
     protected Map<Long, EntryLogMetadata> extractMetaFromEntryLogs(Map<Long, EntryLogMetadata> entryLogMetaMap) {
-        // Extract it for every entry log except for the current one.
-        // Entry Log ID's are just a long value that starts at 0 and increments
-        // by 1 when the log fills up and we roll to a new one.
-        long curLogId = entryLogger.getLeastUnflushedLogId();
+        // Entry Log ID's are just a long value that starts at 0 and increments by 1 when the log fills up and we roll
+        // to a new one. We scan entry logs as follows:
+        // - entryLogPerLedgerEnabled is false: Extract it for every entry log except for the current one (un-flushed).
+        // - entryLogPerLedgerEnabled is true: Scan all flushed entry logs up to the highest known id.
+        Supplier<Long> finalEntryLog = () -> conf.isEntryLogPerLedgerEnabled() ? entryLogger.getLastLogId() :

Review comment:
       Thanks @Vanlightly, then I understand that this change is safe. 
   
   I also like the considerations regarding the refactoring, we have been also discussing this with @fpj. But I wonder if it would be better to first have `entryLogPerLedgerEnabled` option working correctly (goal of this PR), and think in parallel about the refactoring of GC logic and ledger storage separately, as it appears to be a larger change. Does it sound reasonable? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org