You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/07/20 01:25:13 UTC

[GitHub] [hbase] cuibo01 commented on a change in pull request #2084: HBASE-22263 Master creates duplicate ServerCrashProcedure on initiali…

cuibo01 commented on a change in pull request #2084:
URL: https://github.com/apache/hbase/pull/2084#discussion_r456983850



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -846,10 +849,37 @@ private void finishActiveMasterInitialization(MonitoredTask status)
     if (isStopped()) return;
 
     status.setStatus("Submitting log splitting work for previously failed region servers");
+
+    // grab the list of procedures once. SCP fom pre-crash should all be loaded, and can't progress
+    // until AM joins the cluster any SCPs that got added after we get the log folder list should be
+    // for a different start code.
+    final Set<ServerName> alreadyHasSCP = new HashSet<>();
+    long scpCount = 0;
+    for (ProcedureInfo procInfo : this.procedureExecutor.listProcedures() ) {
+      final Procedure proc = this.procedureExecutor.getProcedure(procInfo.getProcId());
+      if (proc != null) {
+        if (proc instanceof ServerCrashProcedure && !(proc.isFinished() || proc.isSuccess())) {
+          scpCount++;
+          alreadyHasSCP.add(((ServerCrashProcedure)proc).getServerName());
+        }
+      }
+    }
+    LOG.info("Restored proceduces include " + scpCount + " SCP covering " + alreadyHasSCP.size() +
+        " ServerName.");
+    
+ 
+    LOG.info("Checking " + previouslyFailedServers.size() + " previously failed servers (seen via wals) for existing SCP.");
+    // AM should be in "not yet init" and these should all be queued
     // Master has recovered hbase:meta region server and we put
     // other failed region servers in a queue to be handled later by SSH
     for (ServerName tmpServer : previouslyFailedServers) {
-      this.serverManager.processDeadServer(tmpServer, true);
+      if (alreadyHasSCP.contains(tmpServer)) {
+        LOG.info("Skipping failed server in FS because it already has a queued SCP: " + tmpServer);
+        this.serverManager.getDeadServers().add(tmpServer);

Review comment:
       > this looks like what's different from my old patch, is that right? have I missed anything else?
   
   yeah , different your old patch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org