You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/04/18 10:36:44 UTC

[GitHub] [bookkeeper] massakam opened a new pull request, #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

massakam opened a new pull request, #3214:
URL: https://github.com/apache/bookkeeper/pull/3214

   ### Motivation
   
   The read throttling feature in `Auditor#checkAllLedgers` added by #2973, but when it was enabled `checkAllLedgers` got stuck.
   
   The cause is that, as mentioned in #3070, all the `BookKeeperClientWorker-OrderedExecutor` threads wait for the semaphore to be released, and there is no thread that can execute `EntryExistsCallback#readEntryComplete` where the semaphore is released.
   
   ### Changes
   
   Add a new thread pool to the `LedgerChecker` class. The `BookKeeperClientWorker-OrderedExecutor` threads threads delegate subsequent processing (this includes waiting for the semaphore to be released) to threads in this thread pool as soon as they release the semaphore.
   
   This will prevent the `BookKeeperClientWorker-OrderedExecutor` from getting stuck due to waiting for the semaphore to be released.
   
   Master Issue: #3070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1121870554

   @dlg99 PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1255580520

   @massakam you are absolutely right, that was an extra release. 
   So, we should not call checkFragments() in the callback that runs on the bookie's ordered executor (acquire blocks a chance to run of the bookie callback that releases permit).
   https://gist.github.com/dlg99/3e1524f0804c4688ccb31e3300b89c4e basically your initial idea with the executor, needs other tests to close LedgerChecker.
   This one: https://gist.github.com/dlg99/28f07f3bcd3f71000b1faba6b906fb09 is kind of the same but uses common pool executor (via whenCompleteAsync)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1200360593

   @massakam  When user configured `inFlightReadEntryNumInLedgerChecker` instead of `readEntryRateInLedgerChecker`, it will still lead to checkAllLedger gets stuck, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1226678843

   > > fix old workflow,please see #3455 for detail
   > 
   > It should have already been fixed.
   
   great,thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1226675645

   > fix old workflow,please see https://github.com/apache/bookkeeper/issues/3455 for detail
   
   It should have already been fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253898953

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109556019

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1244806572

   > @StevenLuMT I think you are probably referring to a past commit.
   
   yes, I have approved this pr,
   you can hide my old comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253559671

   @dlg99 
   > This gist https://gist.github.com/dlg99/505849e1010a20c6d439ecd53f500a85 is more of a demo of what's going on than actual fix (after all, it breaks `testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble` - might be a test issue).
   
   I have a question about this gist. Why is `releasePermit()` added on line 79 of LedgerChecker.java?
   https://gist.github.com/dlg99/505849e1010a20c6d439ecd53f500a85#file-pr3214-diff-L79
   
   `EntryExistsCallback#readEntryComplete` also calls `releasePermit()`, so it seems that one extra permit is released.
   https://github.com/apache/bookkeeper/blob/7004d994938e914561d8343a96d9d75ce98b2837/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java#L321
   
   And if `releasePermit()` on line 75 of LedgerChecker.java is removed, `TestLedgerChecker` times out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] eolivelli merged pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
eolivelli merged PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253572510

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253560310

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1256412616

   >  This one: https://gist.github.com/dlg99/28f07f3bcd3f71000b1faba6b906fb09 is kind of the same but uses common pool executor (via whenCompleteAsync)
   
   Adopted this approach and modified the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1254003527

   > Why is releasePermit() added on line 79 of LedgerChecker.java?
   It is in the callback.
   Permits requested before bookieClient.readEntry called, and must be released in the callback passed there.
   
   ```java
                   final long entryToRead = curEntryId;
   
                   final EntryExistsCallback eecb = new EntryExistsCallback(lh.getLedgerMetadata().getWriteQuorumSize(),
                                                 new GenericCallback<Boolean>() {
                                                     @Override
                                                     public void operationComplete(int rc, Boolean result) {
                                                         releasePermit();
                                                         if (result) {
                                                             fragments.add(lastLedgerFragment);
                                                         }
                                                         checkFragments(fragments, cb,
                                                             percentageOfLedgerFragmentToBeVerified);
                                                     }
                                                 });
   
                   try {
                       DistributionSchedule.WriteSet writeSet = lh.getDistributionSchedule().getWriteSet(entryToRead);
                       acquirePermits(writeSet.size());
                       for (int i = 0; i < writeSet.size(); i++) {
                               BookieId addr = curEnsemble.get(writeSet.get(i));
                               bookieClient.readEntry(addr, lh.getId(), entryToRead,
                                       eecb, null, BookieProtocol.FLAG_NONE);
                       }
                       writeSet.recycle();
                   } catch (InterruptedException e) {
                       LOG.error("InterruptedException when checking entry : {}", entryToRead, e);
                   }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1200976673

   PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1255997272

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109654380

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
gaozhangmin commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1118123312

   > @dlg99 Changed the implementation to achieve throttling using `RateLimiter` instead of `Semaphore`. What do you think?
   > 
   > I think this is simpler than the implementation of adding a new flag and requesting/releasing a permit according to its value.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1240322646

   @hangc0276  have a look, please re-review this pr, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1200860352

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on code in PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#discussion_r969802886


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java:
##########
@@ -159,6 +166,8 @@ public LedgerChecker(BookieClient client, BookieWatcher watcher, int inFlightRea
         } else {
             semaphore = null;
         }
+        executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors(),

Review Comment:
   Do we need `Runtime.getRuntime().availableProcessors()` threads to deal with BookieLedgerChecker callback?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on a diff in pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on code in PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#discussion_r906647070


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java:
##########
@@ -213,7 +213,7 @@ public class ServerConfiguration extends AbstractConfiguration<ServerConfigurati
         "auditorMaxNumberOfConcurrentOpenLedgerOperations";
     protected static final String AUDITOR_ACQUIRE_CONCURRENT_OPEN_LEDGER_OPERATIONS_TIMEOUT_MSEC =
         "auditorAcquireConcurrentOpenLedgerOperationsTimeOutMSec";
-    protected static final String IN_FLIGHT_READ_ENTRY_NUM_IN_LEDGER_CHECKER = "inFlightReadEntryNumInLedgerChecker";
+    protected static final String READ_ENTRY_RATE_IN_LEDGER_CHECKER = "readEntryRateInLedgerChecker";

Review Comment:
   @hangc0276 I don't think the value of `inFlightReadEntryNumInLedgerChecker` should be set to `readEntryRateInLedgerChecker`, as the two settings have slightly different meanings. Therefore, I decided to restore the old setting. If both `readEntryRateInLedgerChecker` and `inFlightReadEntryNumInLedgerChecker` are set, `readEntryRateInLedgerChecker` takes precedence and `inFlightReadEntryNumInLedgerChecker` is ignored.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1163931065

   Another option I have is to revert to the first approach with the extra executor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1256251059

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109688994

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1101920877

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1161990054

   @massakam I think that Throttle is not a good approach here.
   
   Requests-in-progress (RIPs) approach is very similar when everything works but easier to tune and self-adjusts to remote system's dynamically changing capabilities while throttle is impossible to tuen for that.
   Imagine you configured 20 RIPs and one request takes 100ms normally. Now you have 200 RPS (requests per second).
   Remote system (bookie) slowed down and now take 500ms to process a request, now you are down to 40RPS without overloading the bookie, changing config etc.
   Bookie speeds up and can process request in 10ms - the rate dynamically goes up to 2000rps.
   
   Let's use throttle now: set it to 200 request per second.
   In the normal case everything looks ok and the same as in normal case above.
   Now the bookie slows down.
   Throttle keeps on letting through 200 RPS even though the bookie can only handle 40, with that requests will pile up and eventually make result in OOM on bookie or client side. 
   Bookie speeds up and can process request in 10ms - the rate stays at 200rps.
   
   I'd prefer to track down where the semaphore is not released and fix that.
   
   Throttle is simple, does not require release etc. but it has its drawback and was not used there for a reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1166211686

   PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on code in PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#discussion_r905964413


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java:
##########
@@ -213,7 +213,7 @@ public class ServerConfiguration extends AbstractConfiguration<ServerConfigurati
         "auditorMaxNumberOfConcurrentOpenLedgerOperations";
     protected static final String AUDITOR_ACQUIRE_CONCURRENT_OPEN_LEDGER_OPERATIONS_TIMEOUT_MSEC =
         "auditorAcquireConcurrentOpenLedgerOperationsTimeOutMSec";
-    protected static final String IN_FLIGHT_READ_ENTRY_NUM_IN_LEDGER_CHECKER = "inFlightReadEntryNumInLedgerChecker";
+    protected static final String READ_ENTRY_RATE_IN_LEDGER_CHECKER = "readEntryRateInLedgerChecker";

Review Comment:
   The PR https://github.com/apache/bookkeeper/pull/2973 has been released in BookKeeper 4.15.0, we'd better mark the old one as deprecated instead of removing it. And set the old value to the new one if the user sets the old one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1241431548

   @StevenLuMT I think you are probably referring to a past commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on a diff in pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on code in PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#discussion_r970397565


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java:
##########
@@ -159,6 +166,8 @@ public LedgerChecker(BookieClient client, BookieWatcher watcher, int inFlightRea
         } else {
             semaphore = null;
         }
+        executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors(),

Review Comment:
   @hangc0276 We don't necessarily need threads equal to the number of processors. I thought it was strange to specify a fixed value, so I just set `Runtime.getRuntime().availableProcessors()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109162512

   @massakam is it possible to avoid the extra executor?
   
   I think the problem is that checkFragments/verifyLedgerFragment tries to acquire a permit from a callback in `bookieClient.readEntry` call; but there is already a permit request before that call. I think we can simply add a flag to not request/release extra permit  in checkFragments/verifyLedgerFragment call chain in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109583833

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1200975831

   @hangc0276 Reverted to the first approach, which is to introduce an extra executor to `LedgerChecker`. However, because I changed it to use daemon threads instead of user threads, the executor shuts down automatically, and we don't even need to add a shutdown method to `LedgerChecker`.
   
   I can't think of any other good solution to fix this bug...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on a diff in pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on code in PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#discussion_r933708341


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java:
##########
@@ -144,20 +147,30 @@ public LedgerChecker(BookKeeper bkc) {
     }
 
     public LedgerChecker(BookieClient client, BookieWatcher watcher) {
-        this(client, watcher, -1);
+        this(client, watcher, null);
     }
 
-    public LedgerChecker(BookKeeper bkc, int inFlightReadEntryNum) {
-        this(bkc.getBookieClient(), bkc.getBookieWatcher(), inFlightReadEntryNum);
+    public LedgerChecker(BookKeeper bkc, ServerConfiguration conf) {
+        this(bkc.getBookieClient(), bkc.getBookieWatcher(), conf);
     }
 
-    public LedgerChecker(BookieClient client, BookieWatcher watcher, int inFlightReadEntryNum) {
+    public LedgerChecker(BookieClient client, BookieWatcher watcher, ServerConfiguration conf) {
         bookieClient = client;
         bookieWatcher = watcher;
-        if (inFlightReadEntryNum > 0) {
-            semaphore = new Semaphore(inFlightReadEntryNum);
+        if (conf != null) {
+            if (conf.getReadEntryRateInLedgerChecker() > 0) {
+                semaphore = null;
+                rateLimiter = RateLimiter.create(conf.getReadEntryRateInLedgerChecker());
+            } else if (conf.getInFlightReadEntryNumInLedgerChecker() > 0) {
+                semaphore = new Semaphore(conf.getInFlightReadEntryNumInLedgerChecker());

Review Comment:
   add
   `this.inFlightReadEntryNumInLedgerCheckerEnable=true`



##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java:
##########
@@ -166,7 +179,9 @@ public LedgerChecker(BookieClient client, BookieWatcher watcher, int inFlightRea
      * blocking until all are available.
      */
     public void acquirePermit() throws InterruptedException {
-        if (null != semaphore) {
+        if (rateLimiter != null) {

Review Comment:
   add two switch,Would it be clearer?  have a look @massakam @eolivelli 
   ```
   if (this.readEntryRateInLedgerCheckerEnable){
          rateLimiter.acquire(1);
   } else (this.inFlightReadEntryNumInLedgerCheckerEnable) {
          semaphore.acquire(1);
   }
   ```



##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java:
##########
@@ -144,20 +147,30 @@ public LedgerChecker(BookKeeper bkc) {
     }
 
     public LedgerChecker(BookieClient client, BookieWatcher watcher) {
-        this(client, watcher, -1);
+        this(client, watcher, null);
     }
 
-    public LedgerChecker(BookKeeper bkc, int inFlightReadEntryNum) {
-        this(bkc.getBookieClient(), bkc.getBookieWatcher(), inFlightReadEntryNum);
+    public LedgerChecker(BookKeeper bkc, ServerConfiguration conf) {
+        this(bkc.getBookieClient(), bkc.getBookieWatcher(), conf);
     }
 
-    public LedgerChecker(BookieClient client, BookieWatcher watcher, int inFlightReadEntryNum) {
+    public LedgerChecker(BookieClient client, BookieWatcher watcher, ServerConfiguration conf) {
         bookieClient = client;
         bookieWatcher = watcher;
-        if (inFlightReadEntryNum > 0) {
-            semaphore = new Semaphore(inFlightReadEntryNum);
+        if (conf != null) {
+            if (conf.getReadEntryRateInLedgerChecker() > 0) {
+                semaphore = null;
+                rateLimiter = RateLimiter.create(conf.getReadEntryRateInLedgerChecker());

Review Comment:
   add 
   `this.readEntryRateInLedgerCheckerEnable=true`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] dlg99 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253025650

   @massakam @hangc0276 @eolivelli I spent a bit of time trying to understand the root cause and I think I got and idea about what's going on
   .
   This gist https://gist.github.com/dlg99/505849e1010a20c6d439ecd53f500a85  is more of a demo of what's going on than actual fix (after all, it breaks `testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble` - might be a test issue).
   
   So the problem is that semaphore acquired in the loop (+ in recursive calls) but mostly in the loops like 
   ```
   for (int i = 0; i < writeSet.size(); i++) 
   ```
   and 
   ```
   for (Long entryID: entriesToBeVerified)
   ```
   
   e.g. if writeSet is larger than number of semaphore permits, the loop will get stuck because callback will call `checkFragments(`) which calls `verifyLedgerFragment()` (recursive call) and then we get deadlock.
   
   @massakam I didn't look if we can guess max number of entriesToBeVerified so my workaround was to have a single permit per ReadManyEntriesCallback no matter how many entries it reads.
   For the writeset's case callback we can do similar thing with counting down number of processed reads.
   This is the simplest solution I could think of but it will not limit bk reads in progress, rather segments in progress. 
   
   other approach could be getting rid of recursion and instead use queue, that will be more work.
   
   It is possible you can come up with a better solution keeping the root cause in mind.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253634019

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1258896324

   @hangc0276 @rdhabalia @eolivelli @StevenLuMT PTAL. And if there is no problem, please merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1163918086

   @dlg99 Hmm... you have a point there.
   
   However, I once tried to fix it in [the way you suggested first](https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109162512), but I gave it up because the code became complicated and there was a high risk of missing the release of the acquired permit. In this approach, if an additional read is required in `EntryExistsCallback#readEntryComplete()`, it is necessary not to release the permit immediately, but after the subsequent read is complete.
   
   I think it would be better to move to throttling with a rate limiter rather than to keep the RIPs approach and leaving the possibility of getting stuck.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1109758138

   @dlg99 Changed the implementation to achieve throttling using `RateLimiter` instead of `Semaphore`. What do you think?
   
   I think this is simpler than the implementation of adding a new flag and requesting/releasing a permit according to its value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1175825965

   ping @dlg99 @hangc0276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1200360826

   > @massakam I think that Throttle is not a good approach here.
   > 
   > Requests-in-progress (RIPs) approach is very similar when everything works but easier to tune and self-adjusts to remote system's dynamically changing capabilities while throttle is impossible to tuen for that. Imagine you configured 20 RIPs and one request takes 100ms normally. Now you have 200 RPS (requests per second). Remote system (bookie) slowed down and now take 500ms to process a request, now you are down to 40RPS without overloading the bookie, changing config etc. Bookie speeds up and can process request in 10ms - the rate dynamically goes up to 2000rps.
   > 
   > Let's use throttle now: set it to 200 request per second. In the normal case everything looks ok and the same as in normal case above. Now the bookie slows down. Throttle keeps on letting through 200 RPS even though the bookie can only handle 40, with that requests will pile up and eventually make result in OOM on bookie or client side. Bookie speeds up and can process request in 10ms - the rate stays at 200rps.
   > 
   > I'd prefer to track down where the semaphore is not released and fix that.
   > 
   > Throttle is simple, does not require release etc. but it has its drawback and was not used there for a reason.
   
   +1 for finding out the root cause of semaphore not released and fix that.  /cc @massakam 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] rdhabalia commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
rdhabalia commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1211526187

   I think the main issue is #2973 implementation. You should not add a blocking call to throttle unless you have sync API.
   The right fix for #2973 is to check the permit and if the process doesn't have a permit then schedule a task to retry and check the permit rather blocking the thread. 
   so, we should have added `tryAcquire()` instead of blocking `acquire()`. with that fix we won't have thread issues and also will not require the extra threads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] StevenLuMT commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
StevenLuMT commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1225342724

   fix old workflow,please see #3455 for detail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1254060104

   @dlg99 That callback method is called from `EntryExistsCallback#readEntryComplete`. `releasePermit()` is executed `numReads` times in `EntryExistsCallback#readEntryComplete`, so `releasePermit()` will be executed `numReads + 1` times in total.
   ```java
       private class EntryExistsCallback implements ReadEntryCallback {
           AtomicBoolean entryMayExist = new AtomicBoolean(false);
           final AtomicInteger numReads;
           final GenericCallback<Boolean> cb;
   
           EntryExistsCallback(int numReads,
                               GenericCallback<Boolean> cb) {
               this.numReads = new AtomicInteger(numReads);
               this.cb = cb;
           }
   
           @Override
           public void readEntryComplete(int rc, long ledgerId, long entryId,
                                         ByteBuf buffer, Object ctx) {
               releasePermit(); // Executed `numReads` times
               if (BKException.Code.NoSuchEntryException != rc && BKException.Code.NoSuchLedgerExistsException != rc
                       && BKException.Code.NoSuchLedgerExistsOnMetadataServerException != rc) {
                   entryMayExist.set(true);
               }
   
               if (numReads.decrementAndGet() == 0) {
                   cb.operationComplete(rc, entryMayExist.get()); // Called `GenericCallback#operationComplete` and executed `releasePermit()` one more time
               }
           }
       }
   ```
   In fact, when I logged the value of `semaphore.availablePermits()` after the test was completed, it was `inFlightReadEntryNum + 1`. But it should return to `inFlightReadEntryNum`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] massakam commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
massakam commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1256338402

   rerun failure checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] eolivelli commented on pull request #3214: Issue 3070: Fix bug where checkAllLedgers gets stuck when read throttling is enabled

Posted by GitBox <gi...@apache.org>.
eolivelli commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1262300237

   @massakam 
   Merged, thanks !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org