You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by eo...@apache.org on 2020/03/10 13:18:02 UTC

[bookkeeper] branch master updated: fix bookie decommission sleep timeout value is negative bug

This is an automated email from the ASF dual-hosted git repository.

eolivelli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git


The following commit(s) were added to refs/heads/master by this push:
     new 025d99f  fix bookie decommission sleep timeout value is negative bug
025d99f is described below

commit 025d99f5a2a4cc02f3780a11b58a9b9d6c9940c3
Author: hangc0276 <ha...@163.com>
AuthorDate: Tue Mar 10 21:17:53 2020 +0800

    fix bookie decommission sleep timeout value is negative bug
    
    when decommission a bookie, and the ledger size of the bookie is big enough, the thread timeout will get negative, and the decommission operation will give up by throw exceptions as follow
    ```
    14:12:56.982 [main] INFO  org.apache.bookkeeper.client.BookKeeperAdmin - Count of Ledgers which need to be rereplicated: 272752
    14:12:56.983 [main] ERROR org.apache.bookkeeper.bookie.BookieShell - Received exception in DecommissionBookieCmd
    java.lang.IllegalArgumentException: timeout value is negative
    	at java.lang.Thread.sleep(Native Method) ~[?:?]
    	at org.apache.bookkeeper.client.BookKeeperAdmin.waitForLedgersToBeReplicated(BookKeeperAdmin.java:1528) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    	at org.apache.bookkeeper.client.BookKeeperAdmin.decommissionBookie(BookKeeperAdmin.java:1500) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    	at org.apache.bookkeeper.bookie.BookieShell$DecommissionBookieCmd.runCmd(BookieShell.java:2664) [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    	at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:277) [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    	at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:3081) [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    	at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:3172) [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    14:12:57.013 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x206189927840052 closed
    ```
    The exception code is
    ```
    private void waitForLedgersToBeReplicated(Collection<Long> ledgers, BookieSocketAddress thisBookieAddress,
                LedgerManager ledgerManager) throws InterruptedException, TimeoutException {
            int maxSleepTimeInBetweenChecks = 10 * 60 * 1000; // 10 minutes
            int sleepTimePerLedger = 10 * 1000; // 10 secs
            Predicate<Long> validateBookieIsNotPartOfEnsemble = ledgerId -> !areEntriesOfLedgerStoredInTheBookie(ledgerId,
                    thisBookieAddress, ledgerManager);
            while (!ledgers.isEmpty()) {
                LOG.info("Count of Ledgers which need to be rereplicated: {}", ledgers.size());
                int sleepTimeForThisCheck = ledgers.size() * sleepTimePerLedger > maxSleepTimeInBetweenChecks
                        ? maxSleepTimeInBetweenChecks : ledgers.size() * sleepTimePerLedger;
                Thread.sleep(sleepTimeForThisCheck);
                LOG.debug("Making sure following ledgers replication to be completed: {}", ledgers);
                ledgers.removeIf(validateBookieIsNotPartOfEnsemble);
            }
        }
    ```
    the ledger size is `272752`, when computing sleepTimeForThisCheck,
    `ledgers.size() * sleepTimePerLedger` is `272752 * 10 * 1000 = 2727520000`,
    the value exceeds max int value `2147483647`, it will turn to `-1567447296`, then the sleepTimeForThisCheck will be `-1567447296`.
     Thread.sleep will throw `java.lang.IllegalArgumentException: timeout value is negative` exception
    
    Reviewers: Enrico Olivelli <eo...@gmail.com>, Jia Zhai <zh...@apache.org>
    
    This closes #2284 from hangc0276/bug_fix
---
 .../src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
index 88a7c08..cac1d9d 100644
--- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
+++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
@@ -1536,7 +1536,7 @@ public class BookKeeperAdmin implements AutoCloseable {
                 thisBookieAddress, ledgerManager);
         while (!ledgers.isEmpty()) {
             LOG.info("Count of Ledgers which need to be rereplicated: {}", ledgers.size());
-            int sleepTimeForThisCheck = ledgers.size() * sleepTimePerLedger > maxSleepTimeInBetweenChecks
+            int sleepTimeForThisCheck = (long) ledgers.size() * sleepTimePerLedger > maxSleepTimeInBetweenChecks
                     ? maxSleepTimeInBetweenChecks : ledgers.size() * sleepTimePerLedger;
             Thread.sleep(sleepTimeForThisCheck);
             LOG.debug("Making sure following ledgers replication to be completed: {}", ledgers);