You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/02 20:59:43 UTC

[GitHub] [accumulo] keith-turner commented on a change in pull request #2132: Make DeadCompactionDetector handle network hiccups

keith-turner commented on a change in pull request #2132:
URL: https://github.com/apache/accumulo/pull/2132#discussion_r644308904



##########
File path: server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/DeadCompactionDetector.java
##########
@@ -85,31 +103,55 @@ private void detectDeadCompactions() {
 
     running.forEach((ecid) -> {
       if (tabletCompactions.remove(ecid) != null) {
-        log.trace("Removed {} running on a compactor", ecid);
+        log.trace("Removed compaction {} running on a compactor", ecid);
+      }
+      if (this.deadCompactions.remove(ecid) != null) {
+        log.trace("Removed {} from the dead compaction map, it's running on a compactor", ecid);
       }
     });
 
     // Determine which compactions are currently committing and remove those
     context.getAmple().getExternalCompactionFinalStates()
-        .map(ecfs -> ecfs.getExternalCompactionId()).forEach(tabletCompactions::remove);
+        .map(ecfs -> ecfs.getExternalCompactionId()).forEach(ecid -> {
+          if (tabletCompactions.remove(ecid) != null) {
+            log.trace("Removed compaction {} that is committing", ecid);
+          }
+          if (this.deadCompactions.remove(ecid) != null) {
+            log.trace("Removed {} from the dead compaction map, it's committing", ecid);
+          }
+        });
 
-    tabletCompactions
-        .forEach((ecid, extent) -> log.debug("Detected dead compaction {} {}", ecid, extent));
+    tabletCompactions.forEach((ecid, extent) -> {
+      log.debug("Possible dead compaction detected {} {}", ecid, extent);
+      this.deadCompactions.putIfAbsent(ecid, System.currentTimeMillis());

Review comment:
       Using a count instead of time may simplify code.  Not sure if the following is correct, I think it may increment a count or set it to 1 if it does not exists.
   
   ```suggestion
         this.deadCompactions.merge(ecid, 1, Long::sum);
   ```

##########
File path: server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/DeadCompactionDetector.java
##########
@@ -85,31 +103,55 @@ private void detectDeadCompactions() {
 
     running.forEach((ecid) -> {
       if (tabletCompactions.remove(ecid) != null) {
-        log.trace("Removed {} running on a compactor", ecid);
+        log.trace("Removed compaction {} running on a compactor", ecid);
+      }
+      if (this.deadCompactions.remove(ecid) != null) {
+        log.trace("Removed {} from the dead compaction map, it's running on a compactor", ecid);
       }
     });
 
     // Determine which compactions are currently committing and remove those
     context.getAmple().getExternalCompactionFinalStates()
-        .map(ecfs -> ecfs.getExternalCompactionId()).forEach(tabletCompactions::remove);
+        .map(ecfs -> ecfs.getExternalCompactionId()).forEach(ecid -> {
+          if (tabletCompactions.remove(ecid) != null) {
+            log.trace("Removed compaction {} that is committing", ecid);
+          }
+          if (this.deadCompactions.remove(ecid) != null) {
+            log.trace("Removed {} from the dead compaction map, it's committing", ecid);
+          }
+        });
 
-    tabletCompactions
-        .forEach((ecid, extent) -> log.debug("Detected dead compaction {} {}", ecid, extent));
+    tabletCompactions.forEach((ecid, extent) -> {
+      log.debug("Possible dead compaction detected {} {}", ecid, extent);
+      this.deadCompactions.putIfAbsent(ecid, System.currentTimeMillis());
+    });
 
     // Everything left in tabletCompactions is no longer running anywhere and should be failed.
     // Its possible that a compaction committed while going through the steps above, if so then
     // that is ok and marking it failed will end up being a no-op.
-    try {
-      coordinator.compactionFailed(tabletCompactions);
-    } catch (UnknownCompactionIdException e) {
-      // One or more Ids was not in the Running compaction list. This is ok to ignore.
-    }
+    long now = System.currentTimeMillis();
+    this.deadCompactions.forEach((eci, startTime) -> {
+      if ((now - startTime) > threshold) {
+        // Compaction believed to be dead for two cycles. Fail it.
+        try {
+          log.warn(
+              "Failing compaction {} which is believed to be dead. Last seen at {} and not seen since.",
+              eci, startTime);
+          coordinator.compactionFailed(tabletCompactions);
+          this.deadCompactions.remove(eci);
+        } catch (UnknownCompactionIdException e) {
+          // One or more Ids was not in the Running compaction list. This is ok to ignore.
+        }
+      }
+    });

Review comment:
       It would be good to only call `coordinator.compactionFailed()` once because it uses a batch writer in the impl instead of per entry.  The following may do this and uses a count.
   
   ```suggestion
       var compactionsToFail = deadCompactions.entrySet().stream().filter(e -> e.getValue() >= 2).map(e -> e.getKey()).collect(toSet());
      tabletCompactions.keySet().retainAll(compactionsToFail);
      coordinator.compactionFailed(tabletCompactions);
      deadCompactions.keySet().removeAll(compactionsToFail);
   ```

##########
File path: server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/DeadCompactionDetector.java
##########
@@ -75,6 +82,17 @@ private void detectDeadCompactions() {
       tabletCompactions.forEach((ecid, extent) -> log.trace("Saw {} for {}", ecid, extent));
     }
 
+    // Remove from the dead map any compactions that the Tablet's
+    // do not think are running any more.
+    this.deadCompactions.keySet().forEach(eci -> {
+      if (!tabletCompactions.containsKey(eci)) {
+        if (this.deadCompactions.remove(eci) != null)
+          log.trace(
+              "Removed {} from the dead compaction map, no tablet thinks this compaction is running",
+              eci);
+      }
+    });

Review comment:
       I think the following does the same thing.
   
   ```suggestion
       this.deadCompactions.keySet().retailAll(tabletCompactions.keySet());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org