You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/08/26 16:31:49 UTC

[GitHub] [accumulo] DomGarguilo commented on a diff in pull request #2792: closes #1377 - ensure all tables are checked ...

DomGarguilo commented on code in PR #2792:
URL: https://github.com/apache/accumulo/pull/2792#discussion_r956209314


##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java:
##########
@@ -216,6 +225,46 @@ private long removeBlipCandidates(GarbageCollectionEnvironment gce,
     return blipCount;
   }
 
+  @VisibleForTesting
+  /**
+   *
+   */
+  protected void ensureAllTablesChecked(Set<TableId> tableIdsBefore, Set<TableId> tableIdsSeen,
+      Set<TableId> tableIdsAfter) {
+
+    // if a table was added or deleted during this run, it is acceptable to not
+    // have seen those tables ids when scanning the metadata table. So get the intersection
+    Set<TableId> tableIdsMustHaveSeen = new HashSet<>(tableIdsBefore);
+    tableIdsMustHaveSeen.retainAll(tableIdsAfter);

Review Comment:
   ```suggestion
       Set<TableId> tableIdsMustHaveSeen = Sets.intersection(tableIdsBefore, tableIdsAfter);
   ```
   Could use this here to make it a bit more concise and match the comment.



##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java:
##########
@@ -216,6 +225,46 @@ private long removeBlipCandidates(GarbageCollectionEnvironment gce,
     return blipCount;
   }
 
+  @VisibleForTesting
+  /**
+   *
+   */
+  protected void ensureAllTablesChecked(Set<TableId> tableIdsBefore, Set<TableId> tableIdsSeen,
+      Set<TableId> tableIdsAfter) {
+
+    // if a table was added or deleted during this run, it is acceptable to not
+    // have seen those tables ids when scanning the metadata table. So get the intersection
+    Set<TableId> tableIdsMustHaveSeen = new HashSet<>(tableIdsBefore);
+    tableIdsMustHaveSeen.retainAll(tableIdsAfter);
+
+    if (tableIdsMustHaveSeen.isEmpty() && !tableIdsSeen.isEmpty()) {
+      if (!tableIdsBefore.isEmpty() && tableIdsAfter.isEmpty()) {
+        throw new RuntimeException("ZK returned no table ids after scanning for references,"
+            + " maybe all the tables were deleted");
+      } else {
+        // we saw no table ids in ZK but did in the metadata table. This is unexpected.
+        throw new RuntimeException(
+            "Saw no table ids in ZK but did see table ids in metadata table: " + tableIdsSeen);
+      }
+    }
+
+    // From that intersection, remove all the table ids that were seen.
+    tableIdsMustHaveSeen.removeAll(tableIdsSeen);

Review Comment:
   ```suggestion
       tableIdsMustHaveSeen = Sets.difference(tableIdsMustHaveSeen, tableIdsSeen);
   ```
   Will need to use this instead of remove if you decide to use `Sets.intersection` to create `tableIdsMustHaveSeen` (as suggested above)



##########
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java:
##########
@@ -778,4 +838,152 @@ public void bulkImportReplicationRecordsPreventDeletion() throws Exception {
     assertEquals(1, gce.deletes.size());
     assertEquals("hdfs://foo.com:6000/accumulo/tables/2/t-00002/A000002.rf", gce.deletes.get(0));
   }
+
+  @Test
+  public void testMissingTableIds() throws Exception {
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+
+    TestGCE gce = new TestGCE();
+
+    gce.candidates.add("hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+
+    gce.addFileReference("a", null, "hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+    gce.addFileReference("c", null, "hdfs://foo.com:6000/user/foo/tables/c/t-0/F00.rf");
+
+    // add 2 more table references that will not be seen by in the scan
+    gce.tableIds.addAll(Arrays.asList(TableId.of("b"), TableId.of("d")));

Review Comment:
   ```suggestion
       gce.tableIds.addAll(makeUnmodifiableSet("b", "d"));
   ```
   Could use this method here



##########
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java:
##########
@@ -778,4 +838,152 @@ public void bulkImportReplicationRecordsPreventDeletion() throws Exception {
     assertEquals(1, gce.deletes.size());
     assertEquals("hdfs://foo.com:6000/accumulo/tables/2/t-00002/A000002.rf", gce.deletes.get(0));
   }
+
+  @Test
+  public void testMissingTableIds() throws Exception {
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+
+    TestGCE gce = new TestGCE();
+
+    gce.candidates.add("hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+
+    gce.addFileReference("a", null, "hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+    gce.addFileReference("c", null, "hdfs://foo.com:6000/user/foo/tables/c/t-0/F00.rf");
+
+    // add 2 more table references that will not be seen by in the scan
+    gce.tableIds.addAll(Arrays.asList(TableId.of("b"), TableId.of("d")));
+
+    String msg = assertThrows(RuntimeException.class, () -> gca.collect(gce)).getMessage();
+    assertTrue((msg.contains("[b, d]") || msg.contains("[d, b]"))
+        && msg.contains("Saw table IDs in ZK that were not in metadata table:"), msg);
+  }
+
+  // below are tests for potential failure conditions of the GC process. Some of these cases were
+  // observed on clusters. Some were hypothesis based on observations. The result was that
+  // candidate entries were not removed when they should have been and therefore files were
+  // removed from HDFS that were actually still in use
+
+  private Set<TableId> makeUnmodifiableSet(String... args) {
+    Set<TableId> t = new HashSet<>();
+    for (String arg : args) {
+      t.add(TableId.of(arg));
+    }
+    return Collections.unmodifiableSet(t);

Review Comment:
   ```suggestion
       return Arrays.stream(args).map(TableId::of).collect(Collectors.toUnmodifiableSet());
   ```



##########
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java:
##########
@@ -778,4 +838,152 @@ public void bulkImportReplicationRecordsPreventDeletion() throws Exception {
     assertEquals(1, gce.deletes.size());
     assertEquals("hdfs://foo.com:6000/accumulo/tables/2/t-00002/A000002.rf", gce.deletes.get(0));
   }
+
+  @Test
+  public void testMissingTableIds() throws Exception {
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+
+    TestGCE gce = new TestGCE();
+
+    gce.candidates.add("hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+
+    gce.addFileReference("a", null, "hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+    gce.addFileReference("c", null, "hdfs://foo.com:6000/user/foo/tables/c/t-0/F00.rf");
+
+    // add 2 more table references that will not be seen by in the scan
+    gce.tableIds.addAll(Arrays.asList(TableId.of("b"), TableId.of("d")));
+
+    String msg = assertThrows(RuntimeException.class, () -> gca.collect(gce)).getMessage();
+    assertTrue((msg.contains("[b, d]") || msg.contains("[d, b]"))
+        && msg.contains("Saw table IDs in ZK that were not in metadata table:"), msg);
+  }
+
+  // below are tests for potential failure conditions of the GC process. Some of these cases were
+  // observed on clusters. Some were hypothesis based on observations. The result was that
+  // candidate entries were not removed when they should have been and therefore files were
+  // removed from HDFS that were actually still in use
+
+  private Set<TableId> makeUnmodifiableSet(String... args) {
+    Set<TableId> t = new HashSet<>();
+    for (String arg : args) {
+      t.add(TableId.of(arg));
+    }
+    return Collections.unmodifiableSet(t);
+  }
+
+  @Test
+  public void testNormalGCRun() {
+    // happy path, no tables added or removed during this portion and all the tables checked
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddle() {
+    // table was added during this portion and we don't see it, should be fine
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddleTwo() {
+    // table was added during this portion and we DO see it
+    // Means table was added after candidates were grabbed, so there should be nothing to remove
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddle() {
+    // table was deleted during this portion and we don't see it
+    // this mean any candidates from the deleted table wil stay on the candidate list
+    // and during the delete step they will try to removed
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddleTwo() {
+    // table was deleted during this portion and we DO see it
+    // this mean candidates from the deleted table may get removed from the candidate list
+    // which should be ok, as the delete table function should be responsible for removing those
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "4", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testMissEntireTable() {
+    // this test simulates missing an entire table when looking for what files are in use
+    // if you add custom splits to the metadata at able boundaries, this can happen with a failed
+    // scan
+    // recall the ~tab:~pr for this first entry of a new table is empty, so there is now way to
+    // check the prior row. If you split a couple of tables in the metadata the table boundary
+    // , say table ids 2,3,4, and then miss scanning table 3 but get 4, it is possible other
+    // consistency checks will miss this
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.startsWith("Saw table IDs in ZK that were not in metadata table:"));
+
+  }
+
+  @Test
+  public void testZKHadNoTables() {
+    // this test simulates getting nothing from ZK for table ids, which should not happen,
+    // but just in case let's test
+    Set<TableId> tablesBefore = makeUnmodifiableSet();
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2");
+    Set<TableId> tablesAfter = makeUnmodifiableSet();
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.startsWith("Saw no table ids in ZK but did see table ids in metadata table:"));
+  }
+
+  @Test
+  public void testMissingTableAndTableAdd() {
+    // simulates missing a table when checking references, and a table being added
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.equals("Saw table IDs in ZK that were not in metadata table:  [3]"));
+  }
+
+  @Test
+  public void testMissingTableAndTableDeleted() {
+    // simulates missing a table when checking references, and a table being deleted
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "3");
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.equals("Saw table IDs in ZK that were not in metadata table:  [3]"));

Review Comment:
   ```suggestion
       assertEquals("Saw table IDs in ZK that were not in metadata table:  [3]", msg);
   ```
   Can be simplified to assertEquals



##########
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java:
##########
@@ -778,4 +838,152 @@ public void bulkImportReplicationRecordsPreventDeletion() throws Exception {
     assertEquals(1, gce.deletes.size());
     assertEquals("hdfs://foo.com:6000/accumulo/tables/2/t-00002/A000002.rf", gce.deletes.get(0));
   }
+
+  @Test
+  public void testMissingTableIds() throws Exception {
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+
+    TestGCE gce = new TestGCE();
+
+    gce.candidates.add("hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+
+    gce.addFileReference("a", null, "hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+    gce.addFileReference("c", null, "hdfs://foo.com:6000/user/foo/tables/c/t-0/F00.rf");
+
+    // add 2 more table references that will not be seen by in the scan
+    gce.tableIds.addAll(Arrays.asList(TableId.of("b"), TableId.of("d")));
+
+    String msg = assertThrows(RuntimeException.class, () -> gca.collect(gce)).getMessage();
+    assertTrue((msg.contains("[b, d]") || msg.contains("[d, b]"))
+        && msg.contains("Saw table IDs in ZK that were not in metadata table:"), msg);
+  }
+
+  // below are tests for potential failure conditions of the GC process. Some of these cases were
+  // observed on clusters. Some were hypothesis based on observations. The result was that
+  // candidate entries were not removed when they should have been and therefore files were
+  // removed from HDFS that were actually still in use
+
+  private Set<TableId> makeUnmodifiableSet(String... args) {
+    Set<TableId> t = new HashSet<>();
+    for (String arg : args) {
+      t.add(TableId.of(arg));
+    }
+    return Collections.unmodifiableSet(t);
+  }
+
+  @Test
+  public void testNormalGCRun() {
+    // happy path, no tables added or removed during this portion and all the tables checked
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddle() {
+    // table was added during this portion and we don't see it, should be fine
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddleTwo() {
+    // table was added during this portion and we DO see it
+    // Means table was added after candidates were grabbed, so there should be nothing to remove
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "3", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddle() {
+    // table was deleted during this portion and we don't see it
+    // this mean any candidates from the deleted table wil stay on the candidate list
+    // and during the delete step they will try to removed
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddleTwo() {
+    // table was deleted during this portion and we DO see it
+    // this mean candidates from the deleted table may get removed from the candidate list
+    // which should be ok, as the delete table function should be responsible for removing those
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("2", "1", "4", "3");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testMissEntireTable() {
+    // this test simulates missing an entire table when looking for what files are in use
+    // if you add custom splits to the metadata at able boundaries, this can happen with a failed
+    // scan
+    // recall the ~tab:~pr for this first entry of a new table is empty, so there is now way to
+    // check the prior row. If you split a couple of tables in the metadata the table boundary
+    // , say table ids 2,3,4, and then miss scanning table 3 but get 4, it is possible other
+    // consistency checks will miss this
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.startsWith("Saw table IDs in ZK that were not in metadata table:"));
+
+  }
+
+  @Test
+  public void testZKHadNoTables() {
+    // this test simulates getting nothing from ZK for table ids, which should not happen,
+    // but just in case let's test
+    Set<TableId> tablesBefore = makeUnmodifiableSet();
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2");
+    Set<TableId> tablesAfter = makeUnmodifiableSet();
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.startsWith("Saw no table ids in ZK but did see table ids in metadata table:"));
+  }
+
+  @Test
+  public void testMissingTableAndTableAdd() {
+    // simulates missing a table when checking references, and a table being added
+    Set<TableId> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<TableId> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<TableId> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
+
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+    String msg = assertThrows(RuntimeException.class,
+        () -> gca.ensureAllTablesChecked(tablesBefore, tablesSeen, tablesAfter)).getMessage();
+    assertTrue(msg.equals("Saw table IDs in ZK that were not in metadata table:  [3]"));

Review Comment:
   ```suggestion
       assertEquals("Saw table IDs in ZK that were not in metadata table:  [3]", msg);
   ```
   Can be simplified to assertEquals



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org