You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/03/17 18:26:02 UTC

[GitHub] [hbase] apurtell commented on a change in pull request #3027: HBASE-25612 HMaster should abort if ReplicationLogCleaner encounters …

apurtell commented on a change in pull request #3027:
URL: https://github.com/apache/hbase/pull/3027#discussion_r596277274



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
##########
@@ -61,11 +64,23 @@ public void preClean() {
       // but they won't be deleted because they're not in the checking set.
       wals = queueStorage.getAllWALs();
     } catch (ReplicationException e) {
-      LOG.warn("Failed to read zookeeper, skipping checking deletable files");
+      LOG.warn("Failed to read zookeeper, skipping checking deletable files", e);
+      handleSessionExpiredException(e);
       wals = null;
     }
   }
 
+  /*
+    If preClean call encounters SessionExpiredException then we should abort HMaster.
+   */
+  private void handleSessionExpiredException(ReplicationException e) {
+    if (e.getCause() != null && e.getCause() instanceof KeeperException.SessionExpiredException) {

Review comment:
       There are other KeeperException reasons that should probably be considered fatal:
   
   KeeperException.APIErrorException: Not expected, we've messed up in some probably unrecoverable way
   KeeperException.AuthFailedException: We don't have access to znodes we need to have access to
   KeeperException.InvalidACLException: Same, someone messed up the znode ACLs on znodes we need
   KeeperException.InvalidCallbackException: Not expected, we've messed up in some probably unrecoverable way
   KeeperException.NoAuthException: We don't have access to znodes we need to have access to
   KeeperException.RuntimeInconsistencyException: ZK data is corrupt
   KeeperException.SessionMovedException: Maybe this is something that RZK could handle by chasing the session via reconnects, but we don't do it today I don't believe, so is as bad as SessionExpired?
   KeeperException.SystemErrorException: ZK server failure
   KeeperException.UnimplementedException: Not expected, we've messed up in some probably unrecoverable way
   
   and of course KeeperException.SessionExpiredException
   
   Should these be handled too? Consider a ZKUtil static helper method ... call it ZKUtil#isFatalKeeperException ... include these cases and call it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org