You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "GeorgeJahad (via GitHub)" <gi...@apache.org> on 2023/05/24 20:33:36 UTC

[GitHub] [ozone] GeorgeJahad opened a new pull request, #4770: HDDS_8208. Handle large tarballs in bootstrapping.

GeorgeJahad opened a new pull request, #4770:
URL: https://github.com/apache/ozone/pull/4770

   ## What changes were proposed in this pull request?
   
   The incremental bootstrapping feature currently in master, only treats the active fs sst files incrementally.  This PR extends that functionality to the om snapshot related sst files.
   
   The feature is currently implemented with an exclude list.  The follower generates a list of the active sst it has currently received from the leader, and passes that list to the leader as part of the http ratis snapshot request.  The leader then excludes those sst files from next tarball it sneds to the follower.
   
   In this PR, the follower adds the existing om snapshot sst files to the exclude list.  The leader than uses that list to determine which sst files to exclude as well as which can be used as targets for hard links.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-8208
   
   ## How was this patch tested?
   Unit/Integration tests added
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl merged pull request #4770: HDDS-8208. [SNAPSHOT] Allow om snapshot bootstrap tarballs to be incremental.

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl merged PR #4770:
URL: https://github.com/apache/ozone/pull/4770


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on pull request #4770: HDDS-8208. [SNAPSHOT] Allow om snapshot bootstrap tarballs to be incremental.

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on PR #4770:
URL: https://github.com/apache/ozone/pull/4770#issuecomment-1594264751

   Thanks @GeorgeJahad for the improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on a diff in pull request #4770: HDDS-8208. [SNAPSHOT] Allow om snapshot bootstrap tarballs to be incremental.

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on code in PR #4770:
URL: https://github.com/apache/ozone/pull/4770#discussion_r1230682142


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/RDBSnapshotProvider.java:
##########
@@ -60,6 +60,7 @@ public abstract class RDBSnapshotProvider implements Closeable {
   private final AtomicReference<String> lastLeaderRef;
   private final AtomicLong numDownloaded;
   private FaultInjector injector;
+  private final AtomicLong initCount;

Review Comment:
   ```suggestion
     // The number of times init() is called
     private final AtomicLong initCount;
   ```



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/RDBSnapshotProvider.java:
##########
@@ -131,7 +135,8 @@ public DBCheckpoint downloadDBSnapshotFromLeader(String leaderNodeID)
    *
    * @param currentLeader the ID of leader node
    */
-  private void checkLeaderConsistent(String currentLeader) {
+  @VisibleForTesting
+  void checkLeaderConsistent(String currentLeader) throws IOException {

Review Comment:
   nit
   
   ```suggestion
     void checkLeaderConsistency(String currentLeader) throws IOException {
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java:
##########
@@ -3837,15 +3839,17 @@ private void moveCheckpointFiles(File oldDB, Path checkpointPath, File dbDir,
       // an inconsistent state and this marker file will fail OM from
       // starting up.
       Files.createFile(markerFile);
-      // Copy the candidate DB to real DB
-      org.apache.commons.io.FileUtils.copyDirectory(checkpointPath.toFile(),
+      // Link each of the candidate DB files to real DB directory.  This
+      // preserves the links that already exist between files in the
+      // candidate db.
+      OmSnapshotUtils.linkFiles(checkpointPath.toFile(),
           oldDB);
       moveOmSnapshotData(oldDB.toPath(), dbSnapshotsDir.toPath());
       Files.deleteIfExists(markerFile);
     } catch (IOException e) {
       LOG.error("Failed to move downloaded DB checkpoint {} to metadata " +
-              "directory {}. Resetting to original DB.", checkpointPath,
-          oldDB.toPath());
+              "directory {}.  Exception: {}. Resetting to original DB.",

Review Comment:
   nit: extra space
   ```suggestion
                 "directory {}. Exception: {}. Resetting to original DB.",
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServlet.java:
##########
@@ -234,24 +253,69 @@ private void processDir(Path dir, Map<Object, Path> copyFiles,
             LOG.debug("Skipping unneeded file: " + file);
             continue;
           }
-          processDir(file, copyFiles, hardLinkFiles, snapshotPaths);
+          processDir(file, copyFiles, hardLinkFiles, toExcludeFiles,
+              snapshotPaths, excluded);
         } else {
-          processFile(file, copyFiles, hardLinkFiles);
+          processFile(file, copyFiles, hardLinkFiles, toExcludeFiles, excluded);
         }
       }
     }
   }
 
-  private void processFile(Path file, Map<Object, Path> copyFiles,
-                           Map<Path, Path> hardLinkFiles) throws IOException {
-    // Get the inode.
-    Object key = OmSnapshotUtils.getINode(file);
-    // If we already have the inode, store as hard link.
-    if (copyFiles.containsKey(key)) {
-      hardLinkFiles.put(file, copyFiles.get(key));
+  /**
+   * Takes a db file and determines whether it should be included in
+   * the tarball, or added as a link, or excluded altogether.
+   * Uses the toExcludeFiles list to know what already
+   * exists on the follower.
+   * @param file The db file to be processed.
+   * @param copyFiles The db files to be added to tarball.
+   * @param hardLinkFiles The db files to be added as hard links.
+   * @param toExcludeFiles The db files to be excluded from tarball.
+   * @param excluded The list of db files that actually were excluded.
+   */
+  @VisibleForTesting
+  public static void processFile(Path file, Set<Path> copyFiles,
+                           Map<Path, Path> hardLinkFiles,
+                           Set<Path> toExcludeFiles,
+                           List<String> excluded) {
+    if (toExcludeFiles.contains(file)) {
+      excluded.add(file.toString());
     } else {
-      copyFiles.put(key, file);
+      Path fileNamePath = file.getFileName();
+      if (fileNamePath == null) {
+        throw new NullPointerException("Bad file: " + file);
+      }
+      String fileName = fileNamePath.toString();
+      if (fileName.endsWith(ROCKSDB_SST_SUFFIX)) {
+        // If same as existing excluded file, add a link for it.
+        Path linkPath = findLinkPath(toExcludeFiles, fileName);
+        if (linkPath != null) {
+          hardLinkFiles.put(file, linkPath);
+        } else {
+          // If already in tarball add a link for it.
+          linkPath = findLinkPath(copyFiles, fileName);
+          if (linkPath != null) {
+            hardLinkFiles.put(file, linkPath);
+          } else {
+            // Add to tarball.
+            copyFiles.add(file);
+          }
+        }
+      } else { // Not sst file.
+        copyFiles.add(file);

Review Comment:
   ```suggestion
         } else {
           // Not sst file.
           copyFiles.add(file);
   ```



##########
hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/TestRDBSnapshotProvider.java:
##########
@@ -236,4 +238,28 @@ public void insertRandomData(RDBStore dbStore, int familyIndex)
       throw new IOException(e);
     }
   }
+
+  @Test
+  public void testCheckLeaderConsistent() throws IOException {

Review Comment:
   nit
   ```suggestion
     public void testCheckLeaderConsistency() throws IOException {
   ```



##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java:
##########
@@ -233,7 +233,7 @@ private OMConfigKeys() {
       "ozone.om.snapshot.provider.request.timeout";
   public static final TimeDuration
       OZONE_OM_SNAPSHOT_PROVIDER_REQUEST_TIMEOUT_DEFAULT =
-      TimeDuration.valueOf(5000, TimeUnit.MILLISECONDS);
+      TimeDuration.valueOf(300000, TimeUnit.MILLISECONDS);

Review Comment:
   5s timeout to 5m is quite a jump. If this default config value revision is intended (i.e. not just for debugging), pls change `ozone-default.xml` correspondingly:
   
   https://github.com/GeorgeJahad/ozone/blob/f758ab2cd96d92d2d73d51286bdd3c87840bceec/hadoop-hdds/common/src/main/resources/ozone-default.xml#L2008-L2010



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServlet.java:
##########
@@ -234,24 +253,69 @@ private void processDir(Path dir, Map<Object, Path> copyFiles,
             LOG.debug("Skipping unneeded file: " + file);
             continue;
           }
-          processDir(file, copyFiles, hardLinkFiles, snapshotPaths);
+          processDir(file, copyFiles, hardLinkFiles, toExcludeFiles,
+              snapshotPaths, excluded);
         } else {
-          processFile(file, copyFiles, hardLinkFiles);
+          processFile(file, copyFiles, hardLinkFiles, toExcludeFiles, excluded);
         }
       }
     }
   }
 
-  private void processFile(Path file, Map<Object, Path> copyFiles,
-                           Map<Path, Path> hardLinkFiles) throws IOException {
-    // Get the inode.
-    Object key = OmSnapshotUtils.getINode(file);
-    // If we already have the inode, store as hard link.
-    if (copyFiles.containsKey(key)) {
-      hardLinkFiles.put(file, copyFiles.get(key));
+  /**
+   * Takes a db file and determines whether it should be included in
+   * the tarball, or added as a link, or excluded altogether.
+   * Uses the toExcludeFiles list to know what already
+   * exists on the follower.
+   * @param file The db file to be processed.
+   * @param copyFiles The db files to be added to tarball.
+   * @param hardLinkFiles The db files to be added as hard links.
+   * @param toExcludeFiles The db files to be excluded from tarball.
+   * @param excluded The list of db files that actually were excluded.
+   */
+  @VisibleForTesting
+  public static void processFile(Path file, Set<Path> copyFiles,
+                           Map<Path, Path> hardLinkFiles,
+                           Set<Path> toExcludeFiles,
+                           List<String> excluded) {

Review Comment:
   nit: indentation
   ```suggestion
     public static void processFile(Path file, Set<Path> copyFiles,
                                    Map<Path, Path> hardLinkFiles,
                                    Set<Path> toExcludeFiles,
                                    List<String> excluded) {
   ```



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestOmSnapshotUtils.java:
##########
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.om.snapshot;
+
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.hadoop.ozone.om.snapshot.OmSnapshotUtils.getINode;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+/**
+ * Class to test snapshot utilities.
+ */
+public class TestOmSnapshotUtils {
+
+  /**
+   * Test linkFiles().
+   */
+  @Test
+  public void testLinkFiles(@TempDir File tempDir) throws Exception {
+
+    // Create the tree to link from
+    File dir1 = new File(tempDir, "tree1/dir1");
+    File dir2 = new File(tempDir, "tree1/dir2");
+    File tree1 = new File(tempDir, "tree1");
+    assertTrue(dir1.mkdirs());
+    assertTrue(dir2.mkdirs());
+    File f1 = new File(tempDir, "tree1/dir1/f1");
+    Files.write(f1.toPath(), "dummyData".getBytes(UTF_8));
+
+    // Create pointers to expected files/links.
+    File tree2 = new File(tempDir, "tree2");
+    File f1Link = new File(tempDir, "tree2/dir1/f1");
+
+    // Expected files/links shouldn't exist yet.
+    assertFalse(tree2.exists());
+    assertFalse(f1Link.exists());
+
+    OmSnapshotUtils.linkFiles(tree1, tree2);
+
+    // Expected files/links should exist now.
+    assertTrue(tree2.exists());
+    assertTrue(f1Link.exists());
+    assertEquals(getINode(f1.toPath()), getINode(f1Link.toPath()));
+
+    Set<String> tree1Files = Files.walk(tree1.toPath()).
+        map(Path::toString).
+        map((s) -> s.replace("tree1", "tree2")).
+        collect(Collectors.toSet());
+    Set<String> tree2Files = Files.walk(tree2.toPath()).
+        map(Path::toString).collect(Collectors.toSet());
+
+    assertEquals(tree1Files, tree2Files);
+  }

Review Comment:
   nit: clean it up by recursively deleting `tempDir` at the end?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GeorgeJahad commented on pull request #4770: HDDS-8208. [SNAPSHOT] Allow om snapshot bootstrap tarballs to be incremental.

Posted by "GeorgeJahad (via GitHub)" <gi...@apache.org>.
GeorgeJahad commented on PR #4770:
URL: https://github.com/apache/ozone/pull/4770#issuecomment-1593966410

   @smengcl  the checkstyle error is actually because I haven't merged to the latest master.  Fixing now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org