You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/04/08 11:09:34 UTC

[GitHub] [bookkeeper] hangc0276 opened a new pull request, #3192: Fix compaction throttle imprecise

hangc0276 opened a new pull request, #3192:
URL: https://github.com/apache/bookkeeper/pull/3192

   ### Motivation
   When do compaction for an entry log file, it will scan each entry header metadata from beginning to the end of the file. When an entry doesn't deleted, it will read the entry data and write into a new entry log file. The throttle only works on the entry which doesn't deleted.
   https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java#L1012-L1051
   
   For those deleted entries, the throttle strategy doesn't applied. However, we use `BufferedChannel` to read the entry log file which will prefetch data from file when the read buffer missed.
   https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L91-L97
   
   For an entry log file, more than 90% of entries has been deleted, the compactor will scan those entry's header metadata one by one. When reading one entry's metadata, it will missed in BufferedChannel read buffer, it will trigger prefetch from disk. For the following entries, the header metadata reading will also missed in BufferedChannel read buffer, and will continue to prefetch from disk without throttle. It will lead to ledger disk IO util runs high. 
   
   Moreover, for each prefetch operation from disk, it will also trigger OS PageCache prefetch. For the compaction model, the OS PageCache prefetch will lead to PageCache pollution,and maybe also affect the journal sync latency. For this one, we can use the Direct IO to reduce the PageCache effect. https://github.com/apache/bookkeeper/issues/2943
   
   ### Changes
   Add throttle on entry header metadata check stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] zymap commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
zymap commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1097453358

   @hangc0276 Looks like the compaction tests failed. Could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1094246341

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1097464831

   > @hangc0276 Looks like the compaction tests failed. Could you please take a look?
   
   Ok, I will take a look today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1098087867

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1093818553

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1109302185

   > Good work
   > 
   > We definitely need tests about these new methods
   
   @eolivelli  I removed the duplicated new methods, and return the prefetched bytes when calling `BufferedReadChannel.read()`. So we do not need extra unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r930092677


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java:
##########
@@ -61,17 +63,18 @@ public BufferedReadChannel(FileChannel fileChannel, int readCapacity) {
      *         -1 if the given position is greater than or equal to the file's current size.

Review Comment:
   I think @hangc0276 you should update this description for method's return 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1568101353

   The compaction introduced performance issue has been addressed by https://github.com/apache/bookkeeper/pull/3959, close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 closed pull request #3192: Fix compaction throttle imprecise

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 closed pull request #3192: Fix compaction throttle imprecise
URL: https://github.com/apache/bookkeeper/pull/3192


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1093010936

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
dlg99 commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r858327213


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java:
##########
@@ -95,9 +98,10 @@ public synchronized int read(ByteBuf dest, long pos, int length) throws IOExcept
                     throw new IOException("Reading from filechannel returned a non-positive value. Short read.");
                 }
                 readBuffer.writerIndex(readBytes);
+                bytesFetchedFromFile += readBytes;
             }
         }
-        return (int) (currentPosition - pos);
+        return new ImmutablePair<> ((int) (currentPosition - pos), bytesFetchedFromFile);

Review Comment:
   This is a very frequently called code, can we avoid creating a new object?
   I think the throttler can be safely passed down the call chain into this method. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1098148230

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] zymap commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
zymap commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r846862216


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java:
##########
@@ -1014,10 +1041,18 @@ public void scanEntryLog(long entryLogId, EntryLogScanner scanner) throws IOExce
                 if (pos >= bc.size()) {
                     break;
                 }
-                if (readFromLogChannel(entryLogId, bc, headerBuffer, pos) != headerBuffer.capacity()) {
+
+                Pair<Integer, Integer> readBytesPair =
+                    readFromLogChannelWithPrefetchBytes(entryLogId, bc, headerBuffer, pos);
+                if (readBytesPair.getLeft() != headerBuffer.capacity()) {
                     LOG.warn("Short read for entry size from entrylog {}", entryLogId);
                     return;
                 }
+
+                if (throttler != null) {
+                    throttler.acquire(readBytesPair.getRight());

Review Comment:
   When do we release it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r858372194


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java:
##########
@@ -95,9 +98,10 @@ public synchronized int read(ByteBuf dest, long pos, int length) throws IOExcept
                     throw new IOException("Reading from filechannel returned a non-positive value. Short read.");
                 }
                 readBuffer.writerIndex(readBytes);
+                bytesFetchedFromFile += readBytes;
             }
         }
-        return (int) (currentPosition - pos);
+        return new ImmutablePair<> ((int) (currentPosition - pos), bytesFetchedFromFile);

Review Comment:
   Good catch. I'm considering how to avoiding creating new object. 
   The `BufferedReadChannel.read` method is synchronized, we should use throttle in the read method to block it when throttled, otherwise, it will block other threading calling.  I still have no good idea about it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r930132496


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java:
##########
@@ -95,9 +98,10 @@ public synchronized int read(ByteBuf dest, long pos, int length) throws IOExcept
                     throw new IOException("Reading from filechannel returned a non-positive value. Short read.");
                 }
                 readBuffer.writerIndex(readBytes);
+                bytesFetchedFromFile += readBytes;
             }
         }
-        return (int) (currentPosition - pos);
+        return new ImmutablePair<> ((int) (currentPosition - pos), bytesFetchedFromFile);

Review Comment:
   @hangc0276 @dlg99 
   I think you can write like this simple code :
   `return Pair.of((int) (currentPosition - pos), bytesFetchedFromFile); `
   instead of 
   `return new ImmutablePair<> ((int) (currentPosition - pos), bytesFetchedFromFile); `
   
   but I think it's ok, frequent creation of ImmutablePair has no affect 
   <img width="964" alt="image" src="https://user-images.githubusercontent.com/42990025/181050024-667fd915-ba55-4de9-9bb1-da395b60391e.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on code in PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#discussion_r846881714


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java:
##########
@@ -1014,10 +1041,18 @@ public void scanEntryLog(long entryLogId, EntryLogScanner scanner) throws IOExce
                 if (pos >= bc.size()) {
                     break;
                 }
-                if (readFromLogChannel(entryLogId, bc, headerBuffer, pos) != headerBuffer.capacity()) {
+
+                Pair<Integer, Integer> readBytesPair =
+                    readFromLogChannelWithPrefetchBytes(entryLogId, bc, headerBuffer, pos);
+                if (readBytesPair.getLeft() != headerBuffer.capacity()) {
                     LOG.warn("Short read for entry size from entrylog {}", entryLogId);
                     return;
                 }
+
+                if (throttler != null) {
+                    throttler.acquire(readBytesPair.getRight());

Review Comment:
   The release operation controlled by `RateLimiter` library in guava



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1093559869

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1094153786

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1109392133

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on pull request #3192: Fix compaction throttle imprecise

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on PR #3192:
URL: https://github.com/apache/bookkeeper/pull/3192#issuecomment-1225342932

   fix old workflow,please see #3455 for detail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org