You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/07/27 13:40:16 UTC

[GitHub] [accumulo] milleruntime commented on a change in pull request #2214: Remove continue point from Garbage Collector

milleruntime commented on a change in pull request #2214:
URL: https://github.com/apache/accumulo/pull/2214#discussion_r677432267



##########
File path: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java
##########
@@ -51,13 +58,29 @@
     TreeMap<String,Status> filesToReplicate = new TreeMap<>();
 
     @Override
-    public boolean getCandidates(String continuePoint, List<String> ret) {
-      Iterator<String> iter = candidates.tailSet(continuePoint, false).iterator();
-      while (iter.hasNext() && ret.size() < 3) {
-        ret.add(iter.next());
+    public void processCandidates() throws TableNotFoundException, IOException {
+
+      Iterator<String> candidatesIter = candidates.iterator();
+
+      while (candidatesIter.hasNext()) {
+        List<String> candidatesBatch = readCandidatesThatFitInMemory(candidatesIter);
+        new GarbageCollectionAlgorithm().collectBatch(this, candidatesBatch);
       }
+      // Remove all candidates that were tagged for deletion now that the processing of
+      // candidates is complete for this round. This was removed from the 'delete' method
+      // due to that causing a ConcurrentModificationException. T

Review comment:
       Extra character in comment.
   ```suggestion
         // due to that causing a ConcurrentModificationException.
   ```

##########
File path: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionEnvironment.java
##########
@@ -40,19 +39,10 @@
 public interface GarbageCollectionEnvironment {
 
   /**
-   * Return a list of paths to files and dirs which are candidates for deletion from a given table,
-   * {@link RootTable#NAME} or {@link MetadataTable#NAME}
-   *
-   * @param continuePoint
-   *          A row to resume from if a previous invocation was stopped due to finding an extremely
-   *          large number of candidates to remove which would have exceeded memory limitations
-   * @param candidates
-   *          A collection of candidates files for deletion, may not be the complete collection of
-   *          files for deletion at this point in time
-   * @return true if the results are short due to insufficient memory, otherwise false
+   * Process all possible deletion candidates for a given table, deleting candidates that meet all
+   * necessary conditions.
    */
-  boolean getCandidates(String continuePoint, List<String> candidates)
-      throws TableNotFoundException;
+  void processCandidates() throws TableNotFoundException, IOException;

Review comment:
       I wonder if this method should be split up. Previously we had one method to gather the candidates and then another that processed them.

##########
File path: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment gce,
     cleanUpDeletedTableDirs(gce, candidateMap);
   }
 
-  public void collect(GarbageCollectionEnvironment gce) throws TableNotFoundException, IOException {
-
-    String lastCandidate = "";
-
-    boolean outOfMemory = true;
-    while (outOfMemory) {
-      List<String> candidates = new ArrayList<>();
+  /**
+   * Given a sub-list of possible delection candidates, process and remove valid deletion
+   * candidates.
+   */
+  public void collectBatch(GarbageCollectionEnvironment gce, List<String> currentBatch)
+      throws TableNotFoundException, IOException {
 
-      outOfMemory = getCandidates(gce, lastCandidate, candidates);
-
-      if (candidates.isEmpty())
-        break;
-      else
-        lastCandidate = candidates.get(candidates.size() - 1);
+    long origSize = currentBatch.size();
+    gce.incrementCandidatesStat(origSize);
 
-      long origSize = candidates.size();
-      gce.incrementCandidatesStat(origSize);
+    SortedMap<String,String> candidateMap = makeRelative(currentBatch);
 
-      SortedMap<String,String> candidateMap = makeRelative(candidates);
+    confirmDeletesTrace(gce, candidateMap);
+    gce.incrementInUseStat(origSize - candidateMap.size());
 
-      confirmDeletesTrace(gce, candidateMap);
-      gce.incrementInUseStat(origSize - candidateMap.size());
+    deleteConfirmed(gce, candidateMap);
+  }

Review comment:
       The previous `collect()` method was confusing but it was the only public method of the `GarbageCollectionAlgorithm` so it was at least easier to follow from the one entry point. Since you are refactoring the methods, it might be a good opportunity to clean it up and make it easier to follow. The GC was already confusing but the way you split up the methods, is more confusing to me. It looks like you have the correct business logic between the methods but I think it could be organized better.
   
   It seems the simplest solution would be to keep the same methods but just drop the continue point. Was there a reason you had to create new methods? I saw your comment in the tests about ConcurrentModificationException but I am not seeing where that could happen.

##########
File path: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment gce,
     cleanUpDeletedTableDirs(gce, candidateMap);
   }
 
-  public void collect(GarbageCollectionEnvironment gce) throws TableNotFoundException, IOException {
-
-    String lastCandidate = "";
-
-    boolean outOfMemory = true;
-    while (outOfMemory) {
-      List<String> candidates = new ArrayList<>();
+  /**
+   * Given a sub-list of possible delection candidates, process and remove valid deletion

Review comment:
       ```suggestion
      * Given a sub-list of possible deletion candidates, process and remove valid deletion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org