You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/03/05 21:58:23 UTC

[GitHub] dlg99 opened a new issue #1229: PendingAddOp can get recycled before it gets executed

dlg99 opened a new issue #1229: PendingAddOp can get recycled before it gets executed
URL: https://github.com/apache/bookkeeper/issues/1229
 
 
   **BUG REPORT**
   
   Scenario:
   many async writes submitted/pending for the same ledger.
   Write fails due to not enough bookies.
   Ledge handle cancels all pending adds. 
   PendingAddOps of these adds get recycled.
   Pending add finally gets its chance to execute.
   LH at that point happens to be null (recycled already) hence Preconditions.checkNotNull throws exception.
   Exception is logged and swallowed by SafeRunnable.
   
   Problems:
   - unnecessary exceptions affecting performance and generating garbage in the log.
   - it is possible that recycled op gets reused and executed instead of original one. I think it is possible to get second write of the same entry to single bookie in this case in leu of write to another bookie though I did not try to repro that specific case.
   
   
   Minimized version of repro, worked on current master / 3e51b8116c363604f93bdd0c6b6b71ef3387c8a8 :
   
   throw in the following test into SlowBookieTest suite:
   
   ```java
       @Test
       public void testBookieFailure() throws Exception {
           ClientConfiguration conf = new ClientConfiguration();
           conf.setReadTimeout(5)
                   .setAddEntryQuorumTimeout(5)
                   .setAddEntryQuorumTimeout(10)
                   .setZkServers(zkUtil.getZooKeeperConnectString());
   
           BookKeeper bkc = new BookKeeper(conf);
   
           byte[] pwd = new byte[] {};
           final LedgerHandle lh = bkc.createLedger(4, 4, 3, BookKeeper.DigestType.CRC32, pwd);
           final AtomicBoolean finished = new AtomicBoolean(false);
           final AtomicBoolean failTest = new AtomicBoolean(false);
           final byte[] entry = "Test Entry".getBytes();
   
           Thread t = new Thread(() -> {
               try {
                   while (!finished.get()) {
                       lh.asyncAddEntry(entry, (rc, lh1, entryId, ctx) -> {
                           if (rc != BKException.Code.OK) {
                               failTest.set(true);
                               finished.set(true);
                           }
                       }, null);
                   }
               } catch (Exception e) {
                   LOG.error("Exception in add entry thread", e);
                   failTest.set(true);
               }
           });
   
           killBookie(0);
   
           t.start();
   
           Thread.sleep(2_000);
   
           finished.set(true);
           t.join();
   
           assertFalse(failTest.get());
   
           lh.close();
   
           LedgerHandle lh2 = bkc.openLedger(lh.getId(), BookKeeper.DigestType.CRC32, pwd);
           LedgerChecker lc = new LedgerChecker(bkc);
           final CountDownLatch checklatch = new CountDownLatch(1);
           final AtomicInteger numFragments = new AtomicInteger(-1);
           lc.checkLedger(lh2, new GenericCallback<Set<LedgerFragment>>() {
               public void operationComplete(int rc, Set<LedgerFragment> fragments) {
                   LOG.debug("Checked ledgers returned {} {}", rc, fragments);
                   if (rc == BKException.Code.OK) {
                       numFragments.set(fragments.size());
                       LOG.error("Checked ledgers returned {} {}", rc, fragments);
                   }
                   checklatch.countDown();
               }
           });
           checklatch.await();
           assertEquals("There should be no missing fragments", 0, numFragments.get());
       }
   ```
   
   Note log messages after execution:
   
   ```
   2018-03-05 13:41:43,751 - WARN  - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:RackawareEnsemblePlacementPolicyImpl@695] - Failed to choose a bookie: excluded [<Bookie:127.0.0.1:50656>, <Bookie:127.0.0.1:50660>, <Bookie:127.0.0.1:50662>, <Bookie:127.0.0.1:50658>], fallback to choose bookie randomly from the cluster.
   2018-03-05 13:41:43,751 - WARN  - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:RackawareEnsemblePlacementPolicyImpl@886] - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:50656>, <Bookie:127.0.0.1:50660>, <Bookie:127.0.0.1:50662>, <Bookie:127.0.0.1:50658>], allBookies [].
   2018-03-05 13:41:43,751 - ERROR - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:LedgerHandle@1652] - Could not get additional bookie to remake ensemble, closing ledger: 0
   2018-03-05 13:41:43,751 - ERROR - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:LedgerHandle@1495] - Closing ledger 0 due to NotEnoughBookiesException: Not enough non-faulty bookies available
   2018-03-05 13:41:43,751 - ERROR - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:SafeRunnable@38] - Unexpected throwable caught 
   java.lang.NullPointerException
   	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770)
   	at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:229)
   	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
   	at java.lang.Thread.run(Thread.java:745)
   2018-03-05 13:41:43,751 - ERROR - [BookKeeperClientWorker-OrderedSafeExecutor-0-0:SafeRunnable@38] - Unexpected throwable caught 
   java.lang.NullPointerException
   	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770)
   	at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:229)
   	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
   	at java.lang.Thread.run(Thread.java:745)
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services