You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/06/29 14:38:09 UTC

[GitHub] [accumulo] milleruntime commented on a diff in pull request #2792: closes #1377 - ensure all tables are checked ...

milleruntime commented on code in PR #2792:
URL: https://github.com/apache/accumulo/pull/2792#discussion_r909832662


##########
server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java:
##########
@@ -457,4 +459,28 @@ public long getErrorsStat() {
   public long getCandidatesStat() {
     return candidates;
   }
+
+  @Override
+  public boolean isRootTable() {
+    return level == DataLevel.ROOT;
+  }
+
+  @Override
+  public boolean isMetadataTable() {
+    return level == DataLevel.METADATA;
+  }
+
+  @Override
+  public Set<TableId> getCandidateTableIDs() {
+    if (isRootTable()) {
+      return Collections.singleton(MetadataTable.ID);
+    } else if (isMetadataTable()) {
+      Set<TableId> tableIds = new HashSet<>(getTableIDs());

Review Comment:
   This is OK but just calling `getTableIDs()` is probably not enough on a highly active cluster. This just calls the API and gets whatever is cached at the time. 
   
   @ctubbsii comment from the original PR: "It can't detect tables that existed at the start but skipped, if the ZooCache wasn't up-to-date, and it can't detect skipping over any tables that were created or deleted during the scan. This is especially problematic if a new table was created as the result of a clone operation, which will duplicate all the file references for the new table."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org