You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/06/21 21:45:09 UTC

[GitHub] [bookkeeper] dlg99 opened a new issue, #3351: Audito exits silently on ZK timeout

dlg99 opened a new issue, #3351:
URL: https://github.com/apache/bookkeeper/issues/3351

   **BUG REPORT**
   
   ***Describe the bug***
   
   Autorecovery running standalone:
   
   ```
   2022-06-14T05:14:25,461 [main] INFO  org.apache.bookkeeper.client.BookKeeperAdmin - Resetting LostBookieRecoveryDelay value: 0, to kickstart audit task
   2022-06-14T05:14:25,461 [main] DEBUG org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - setLostBookieRecoveryDelay()
   2022-06-14T05:14:25,612 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x3000418b60b0047 closed
   2022-06-14T05:14:25,612 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3000418b60b0047
   2022-06-14T05:14:25,612 [main] INFO  org.apache.bookkeeper.meta.ZkLedgerAuditorManager - Shutting down AuditorElector
   ```
   
   after loss of ZK connectivity.
   
   Eventually this can get to the situation when there is no Auditor in the cluster.
   
   ***Expected behavior***
   
   Auditor shutdown should either result in the thread restart/attempted reconnect to ZK, if needed or should trigger AR service's shutdown/fail healthcheck so k8s has a chance to restart the service.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] MarvinCai commented on issue #3351: Auditor exits silently on ZK timeout

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #3351:
URL: https://github.com/apache/bookkeeper/issues/3351#issuecomment-1242756625

   @dlg99 according this code snippet, the DeatchWather should be able to catch the case and shut down the whole autorecovery right? https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/AutoRecoveryMain.java#L213-L240
          ```
        setUncaughtExceptionHandler((thread, cause) -> {
                   LOG.info("AutoRecoveryDeathWatcher exited loop due to uncaught exception from thread {}",
                       thread.getName(), cause);
                   shutdown();
               });
           }
   
           @Override
           public void run() {
               while (true) {
                   try {
                       Thread.sleep(watchInterval);
                   } catch (InterruptedException ie) {
                       Thread.currentThread().interrupt();
                   }
                   // If any one service not running, then shutdown peer.
                   if (!autoRecoveryMain.auditorElector.isRunning() || !autoRecoveryMain.replicationWorker.isRunning()) {
                       LOG.info(
                               "AutoRecoveryDeathWatcher noticed the AutoRecovery is not running any more,"
                               + "exiting the watch loop!");
                       /*
                        * death watcher has noticed that AutoRecovery is not
                        * running any more throw an exception to fail the death
                        * watcher thread and it will trigger the uncaught exception
                        * handler to handle this "AutoRecovery not running"
                        * situation.
                        */
                       throw new RuntimeException("AutoRecovery is not running any more");
                   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org