You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/09/15 18:17:23 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3738: Remove GC candidates found to still be referenced (WIP)

keith-turner commented on code in PR #3738:
URL: https://github.com/apache/accumulo/pull/3738#discussion_r1327627210


##########
server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java:
##########
@@ -104,21 +107,47 @@ public GCRun(Ample.DataLevel level, ServerContext context) {
   }
 
   @Override
-  public Iterator<String> getCandidates() {
+  public Iterator<GcCandidate> getCandidates() {
     return context.getAmple().getGcCandidates(level);
   }
 
+  /**
+   * Removes gcCandidates from the metadata location depending on type.
+   *
+   * @param gcCandidates Collection of deletion reference candidates to remove.
+   * @param type type of deletion reference candidates.
+   */
+  @Override
+  public void deleteGcCandidates(Collection<GcCandidate> gcCandidates, GcCandidateType type) {
+    if (inSafeMode()) {
+      System.out.println("SAFEMODE: There are " + gcCandidates.size()
+          + " reference file gcCandidates entries marked for deletion from " + level + ".\n"
+          + "          Examine the log files to identify them.\n");
+      log.info("SAFEMODE: Listing all ref file gcCandidates for deletion");
+      for (GcCandidate gcCandidate : gcCandidates) {
+        log.info("SAFEMODE: {}", gcCandidate);
+      }
+      log.info("SAFEMODE: End reference candidates for deletion");
+    }
+
+    if (!config.getBoolean(Property.GC_REMOVE_IN_USE_CANDIDATES)
+        && type.equals(GcCandidateType.INUSE)) {
+      return;
+    }

Review Comment:
   Feel free to ignore this comment because I am not sure if its workable, makes sense, or maybe its already done.  I was wondering if it would make sense to move this check out of GCRun and into GarbageCollectionAlgorithm somehow.  If possible it may make this check more easily testable by the unit test for the GarbageCollectionAlgorithm.  Maybe the check is still here for validation purposes, not sure.



##########
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java:
##########
@@ -863,6 +915,45 @@ public void testMissingTableIds() throws Exception {
         && msg.contains("Saw table IDs in ZK that were not in metadata table:"), msg);
   }
 
+  @Test
+  public void testDeletingInUseReferenceCandidates() throws Exception {

Review Comment:
   If they are not covered elsewhere, would be nice to cover a few more things in this test.
   
    * test directory candidates w/ the in use case
    * test multiple candidates w/ the in use case where some are in use and some are not
    * verify the algorithm passes the expected GcCandidateType and timestamp when it does delete a candidate
   
   Thinking that TestGCE.addCandidate could be modified to return the added candidate so would always know the expected timestamp for the in use case.   TestGCE may need be modified to track more info to verify the expected GcCandidateType and timestamp when deleting a candidate



##########
core/src/main/java/org/apache/accumulo/core/gc/GcCandidate.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.gc;
+
+import java.lang.Object;
+import java.util.Objects;
+
+import org.apache.accumulo.core.metadata.StoredTabletFile;
+
+public class GcCandidate implements Comparable<GcCandidate> {
+  private final Long uid;
+  private final String path;
+
+  public GcCandidate(String path, Long uid) {
+    this.path = path;
+    this.uid = uid;
+  }

Review Comment:
   It does not seem like it expected that the uid would be null.  If that is the case, can change the type to the primitive long.
   
   ```suggestion
     private final long uid;
     private final String path;
   
     public GcCandidate(String path, long uid) {
       this.path = path;
       this.uid = uid;
     }
   ```



##########
server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java:
##########
@@ -335,7 +364,7 @@ public void deleteConfirmedCandidates(SortedMap<String,String> confirmedDeletes)
       log.error("{}", e1.getMessage(), e1);
     }
 
-    context.getAmple().deleteGcCandidates(level, processedDeletes);
+    deleteGcCandidates(processedDeletes, GcCandidateType.VALID);

Review Comment:
   Is per candidate logging happening anywhere that includes the GcCandidate type?



##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java:
##########
@@ -156,8 +159,10 @@ private void removeCandidatesInUse(GarbageCollectionEnvironment gce,
 
         dir = makeRelative(dir, 2);
 
-        if (candidateMap.remove(dir) != null) {
+        GcCandidate gcTemp = candidateMap.remove(dir);
+        if (gcTemp != null) {
           log.debug("Candidate was still in use: {}", dir);
+          inUseCandidates.add(gcTemp);

Review Comment:
   This is a set, so its possible the add is unsuccessful.  Would we ever expect the add to fail?  wonder if the return value of the add call should be checked or if a list should/could be used instead of a set.



##########
core/src/main/java/org/apache/accumulo/core/gc/GcCandidate.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.gc;
+
+import java.lang.Object;
+import java.util.Objects;
+
+import org.apache.accumulo.core.metadata.StoredTabletFile;
+
+public class GcCandidate implements Comparable<GcCandidate> {
+  private final Long uid;
+  private final String path;
+
+  public GcCandidate(String path, Long uid) {
+    this.path = path;
+    this.uid = uid;
+  }
+
+  public String getPath() {
+    return path;
+  }
+
+  public Long getUid() {
+    return uid;
+  }
+
+  public String getFileName() {
+    return new StoredTabletFile(path).getFileName();
+  }
+
+  public String getParent() {
+    return new StoredTabletFile(path).getPath().getParent().toString();
+  }

Review Comment:
   These method do not seem to be used.
   
   ```suggestion
   ```



##########
test/src/main/java/org/apache/accumulo/test/functional/GarbageCollectorIT.java:
##########
@@ -329,4 +372,61 @@ private void addEntries(AccumuloClient client) throws Exception {
       }
     }
   }
+
+  private void createAndDeleteUniqueMutation(TableId tableId, Ample.GcCandidateType type) {

Review Comment:
   
   Did not review this yet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org