You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2023/01/19 00:25:11 UTC

[GitHub] [ozone] hemantk-12 opened a new pull request, #4190: HDDS-7800: Use persistent map to generate snapshot diff report

hemantk-12 opened a new pull request, #4190:
URL: https://github.com/apache/ozone/pull/4190

   ## What changes were proposed in this pull request?
   This change is to use persistent map (backed by RocksDB) to store intermediate objects to generate Snapshot diff.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-7800
   
   ## How was this patch tested?
   Manually using snapshot diff command
   ```
   ozone sh snapshot snapshotDiff /${volume}/${bucket} snapshot1 snapshot2
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] DaveTeng0 commented on a diff in pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "DaveTeng0 (via GitHub)" <gi...@apache.org>.
DaveTeng0 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1102147621


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -113,54 +134,154 @@ public SnapshotDiffReport getSnapshotDiffReport(final String volume,
     final BucketLayout bucketLayout = getBucketLayout(volume, bucket,
         fromSnapshot.getMetadataManager());
 
-    /*
-     * The reason for having ObjectID to KeyName mapping instead of OmKeyInfo
-     * is to reduce the memory footprint.
-     */
-    final Map<Long, String> oldObjIdToKeyMap = new HashMap<>();
-    // Long --> const. length
-    // String --> var. length "/dir1/dir2/dir3/dir4/dir5/key1"
-    final Map<Long, String> newObjIdToKeyMap = new HashMap<>();
-
-    final Set<Long> objectIDsToCheck = new HashSet<>();
-
-    // add to object ID map for key/file.
-
-    final Table<String, OmKeyInfo> fsKeyTable = fromSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Table<String, OmKeyInfo> tsKeyTable = toSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Set<String> deltaFilesForKeyOrFileTable =
-        getDeltaFiles(fromSnapshot, toSnapshot,
-            Collections.singletonList(fsKeyTable.getName()),
-            fsInfo, tsInfo, volume, bucket);
-
-    addToObjectIdMap(fsKeyTable, tsKeyTable, deltaFilesForKeyOrFileTable,
-        oldObjIdToKeyMap, newObjIdToKeyMap, objectIDsToCheck, false);
-
-    if (bucketLayout.isFileSystemOptimized()) {
-      // add to object ID map for directory.
-      final Table<String, OmDirectoryInfo> fsDirTable =
-          fromSnapshot.getMetadataManager().getDirectoryTable();
-      final Table<String, OmDirectoryInfo> tsDirTable =
-          toSnapshot.getMetadataManager().getDirectoryTable();
-      final Set<String> deltaFilesForDirTable =
+    // TODO: This should comes from request itself.
+    String requestId = UUID.randomUUID().toString();
+
+    ColumnFamilyHandle fromSnapshotColumnFamily = null;
+    ColumnFamilyHandle toSnapshotColumnFamily = null;
+    ColumnFamilyHandle objectIDsColumnFamily = null;
+    ColumnFamilyHandle diffReportColumnFamily = null;
+
+    try {
+      // RequestId is prepended to column family name to make it unique
+      // for request.
+      fromSnapshotColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-fromSnapshot"),
+              new ManagedColumnFamilyOptions()));
+      toSnapshotColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-toSnapshot"),
+              new ManagedColumnFamilyOptions()));
+      objectIDsColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-objectIDs"),
+              new ManagedColumnFamilyOptions()));
+      diffReportColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-diffReport"),
+              new ManagedColumnFamilyOptions()));
+
+      // ObjectId to keyName map to keep key info for fromSnapshot.
+      // objectIdToKeyNameMap is used to identify what keys were touched
+      // in which snapshot and to know the difference if operation was
+      // creation, deletion, modify or rename.
+      // Stores only keyName instead of OmKeyInfo to reduce the memory
+      // footprint.
+      final PersistentMap<Long, String> objectIdToKeyNameMapForFromSnapshot =
+          new RocksDbPersistentMap<>(db,
+              fromSnapshotColumnFamily,
+              codecRegistry,
+              Long.class,
+              String.class);
+
+      // ObjectId to keyName map to keep key info for toSnapshot.
+      final PersistentMap<Long, String> objectIdToKeyNameMapForToSnapshot =
+          new RocksDbPersistentMap<>(db,
+              toSnapshotColumnFamily,
+              codecRegistry,
+              Long.class,
+              String.class);
+
+      // Set of unique objectId between fromSnapshot and toSnapshot.
+      final PersistentSet<Long> objectIDsToCheckMap =

Review Comment:
   oh just a small thing~ this is a set actually?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] DaveTeng0 commented on a diff in pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "DaveTeng0 (via GitHub)" <gi...@apache.org>.
DaveTeng0 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1100915722


##########
hadoop-ozone/interface-storage/src/main/java/org/apache/hadoop/ozone/om/codec/OmDBDiffReportEntryCodec.java:
##########
@@ -0,0 +1,53 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.codec;
+
+import java.io.IOException;
+import org.apache.hadoop.hdds.utils.db.Codec;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos;
+import org.apache.hadoop.ozone.snapshot.SnapshotDiffReport.DiffReportEntry;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+/**
+ * Codec to encode DiffReportEntry as byte array.
+ */
+public class OmDBDiffReportEntryCodec implements Codec<DiffReportEntry> {
+
+  @Override
+  public byte[] toPersistedFormat(DiffReportEntry object)
+      throws IOException {
+    checkNotNull(object, "Null object can't be converted to byte array.");
+    return object.toProtobuf().toByteArray();
+  }
+
+  @Override
+  public DiffReportEntry fromPersistedFormat(byte[] rawData)
+      throws IOException {
+    checkNotNull(rawData, "Null byte array can't be converted to " +
+        "real object.");
+    return DiffReportEntry.fromProtobuf(
+        OzoneManagerProtocolProtos.DiffReportEntryProto.parseFrom(rawData));
+  }
+
+  @Override
+  public DiffReportEntry copyObject(DiffReportEntry object) {
+    // Note: Not really a "copy". from OmDBDiffReportEntryCodec
+    return object;

Review Comment:
   i guess this method will be implemented in other ticket~? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl merged pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl merged PR #4190:
URL: https://github.com/apache/ozone/pull/4190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on a diff in pull request #4190: HDDS-7800. Use persistent map and set to generate snapshot diff report

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1096177215


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java:
##########
@@ -424,4 +424,6 @@ private OMConfigKeys() {
       = "ozone.om.enable.ofs.shared.tmp.dir";
   public static final boolean OZONE_OM_ENABLE_OFS_SHARED_TMP_DIR_DEFAULT
       = false;
+  public static final String OZONE_OM_SNAPSHOT_DIFF_DB_DIR
+      = "ozone.om.snapshot.diff.db.dirs";

Review Comment:
   Remove `s` at the end?
   
   ```suggestion
         = "ozone.om.snapshot.diff.db.dir";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1091295980


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -268,4 +280,29 @@ private void verifySnapshotInfoForSnapDiff(final SnapshotInfo fromSnapshot,
           " should be older than to toSnapshot:" + toSnapshot.getName());
     }
   }
+
+  private RocksDB createRocksDbForSnapshotDiff(OzoneConfiguration config) {

Review Comment:
   Yes, for now. We may revisit this when we work on making snapdiff HA or guarantee the persistence of diff report.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map and set to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1094935941


##########
hadoop-ozone/ozone-manager/pom.xml:
##########
@@ -271,6 +271,44 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
           <excludeFilterFile>${basedir}/dev-support/findbugsExcludeFile.xml</excludeFilterFile>
         </configuration>
       </plugin>
+      <plugin>
+        <artifactId>maven-enforcer-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>depcheck</id>
+            <phase></phase>
+          </execution>
+          <execution>
+            <id>banned-rocksdb-imports</id>
+            <phase>process-sources</phase>
+            <goals>
+              <goal>enforce</goal>
+            </goals>
+            <configuration>
+              <rules>
+                <RestrictImports>
+                  <includeTestCode>false</includeTestCode>
+                  <reason>Use managed RocksObjects under org.apache.hadoop.hdds.utils.db.managed instead.</reason>
+                  <!-- By default, ban all the classes in org.rocksdb -->
+                  <bannedImport>org.rocksdb.**</bannedImport>
+                  <allowedImports>
+                    <!-- Allow these imports for snapshot diff. We can't use managed DB for Snapshot diff because

Review Comment:
   Thanks @duongkame for the suggestion. Updated to use Managed objects.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -268,4 +280,29 @@ private void verifySnapshotInfoForSnapDiff(final SnapshotInfo fromSnapshot,
           " should be older than to toSnapshot:" + toSnapshot.getName());
     }
   }
+
+  private RocksDB createRocksDbForSnapshotDiff(OzoneConfiguration config) {
+    final Options options = new Options().setCreateIfMissing(true);
+
+    final File dbDirPath =
+        ServerUtils.getDBPath(config, OZONE_OM_SNAPSHOT_DIFF_DB_DIR);
+
+    String dbPath = Paths.get(dbDirPath.toString(), OM_SNAPSHOT_DIFF_DB_NAME)
+        .toFile()
+        .getAbsolutePath();
+
+    try {
+      return RocksDB.open(options, dbPath);

Review Comment:
   Added as suggested.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -35,6 +38,8 @@
 import org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager;
 import org.apache.hadoop.ozone.snapshot.SnapshotDiffReport;
 import org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer;
+import org.rocksdb.Options;

Review Comment:
   Updated to use Managed objects. 



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/RocksDbPersistentList.java:
##########
@@ -0,0 +1,139 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.om.snapshot;
+
+import java.io.IOException;
+import java.util.Iterator;
+import org.apache.hadoop.hdds.utils.db.CodecRegistry;
+import org.rocksdb.ColumnFamilyHandle;
+import org.rocksdb.RocksDB;

Review Comment:
   Updated to use Managed objects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1087485822


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java:
##########
@@ -556,6 +556,7 @@ private OzoneConsts() {
   public static final String OM_SNAPSHOT_NAME = "snapshotName";
   public static final String OM_SNAPSHOT_DIR = "db.snapshots";
   public static final String OM_SNAPSHOT_INDICATOR = ".snapshot";
+  public static final String OM_SNAPSHOT_DIFF_DB_NAME = "db.snapshots.diff";

Review Comment:
   What about
   ```suggestion
     public static final String OM_SNAPSHOT_DIFF_DB_NAME = "db.snapdiffs";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] duongkame commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by GitBox <gi...@apache.org>.
duongkame commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1081958236


##########
hadoop-ozone/ozone-manager/pom.xml:
##########
@@ -271,6 +271,44 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
           <excludeFilterFile>${basedir}/dev-support/findbugsExcludeFile.xml</excludeFilterFile>
         </configuration>
       </plugin>
+      <plugin>
+        <artifactId>maven-enforcer-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>depcheck</id>
+            <phase></phase>
+          </execution>
+          <execution>
+            <id>banned-rocksdb-imports</id>
+            <phase>process-sources</phase>
+            <goals>
+              <goal>enforce</goal>
+            </goals>
+            <configuration>
+              <rules>
+                <RestrictImports>
+                  <includeTestCode>false</includeTestCode>
+                  <reason>Use managed RocksObjects under org.apache.hadoop.hdds.utils.db.managed instead.</reason>
+                  <!-- By default, ban all the classes in org.rocksdb -->
+                  <bannedImport>org.rocksdb.**</bannedImport>
+                  <allowedImports>
+                    <!-- Allow these imports for snapshot diff. We can't use managed DB for Snapshot diff because

Review Comment:
   well, if that's true, you can add those snapshot diff classes in a package and put it under `exclusion`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1087485534


##########
hadoop-hdds/common/src/main/resources/ozone-default.xml:
##########
@@ -3592,4 +3592,19 @@
       history from compaction DAG. Uses millisecond by default when no time unit is specified.
     </description>
   </property>
+
+  <property>
+    <name>ozone.om.snapshot.diff.db.dirs</name>
+    <value/>
+    <tag>OZONE, OM </tag>

Review Comment:
   ```suggestion
       <tag>OZONE, OM</tag>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1091294392


##########
hadoop-ozone/ozone-manager/pom.xml:
##########
@@ -271,6 +271,44 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
           <excludeFilterFile>${basedir}/dev-support/findbugsExcludeFile.xml</excludeFilterFile>
         </configuration>
       </plugin>
+      <plugin>
+        <artifactId>maven-enforcer-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>depcheck</id>
+            <phase></phase>
+          </execution>
+          <execution>
+            <id>banned-rocksdb-imports</id>
+            <phase>process-sources</phase>
+            <goals>
+              <goal>enforce</goal>
+            </goals>
+            <configuration>
+              <rules>
+                <RestrictImports>
+                  <includeTestCode>false</includeTestCode>
+                  <reason>Use managed RocksObjects under org.apache.hadoop.hdds.utils.db.managed instead.</reason>
+                  <!-- By default, ban all the classes in org.rocksdb -->
+                  <bannedImport>org.rocksdb.**</bannedImport>
+                  <allowedImports>
+                    <!-- Allow these imports for snapshot diff. We can't use managed DB for Snapshot diff because

Review Comment:
   You can't have multiple exclusion rules unfortunately. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1101918731


##########
hadoop-ozone/interface-storage/src/main/java/org/apache/hadoop/ozone/om/codec/OmDBDiffReportEntryCodec.java:
##########
@@ -0,0 +1,53 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.codec;
+
+import java.io.IOException;
+import org.apache.hadoop.hdds.utils.db.Codec;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos;
+import org.apache.hadoop.ozone.snapshot.SnapshotDiffReport.DiffReportEntry;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+/**
+ * Codec to encode DiffReportEntry as byte array.
+ */
+public class OmDBDiffReportEntryCodec implements Codec<DiffReportEntry> {
+
+  @Override
+  public byte[] toPersistedFormat(DiffReportEntry object)
+      throws IOException {
+    checkNotNull(object, "Null object can't be converted to byte array.");
+    return object.toProtobuf().toByteArray();
+  }
+
+  @Override
+  public DiffReportEntry fromPersistedFormat(byte[] rawData)
+      throws IOException {
+    checkNotNull(rawData, "Null byte array can't be converted to " +
+        "real object.");
+    return DiffReportEntry.fromProtobuf(
+        OzoneManagerProtocolProtos.DiffReportEntryProto.parseFrom(rawData));
+  }
+
+  @Override
+  public DiffReportEntry copyObject(DiffReportEntry object) {
+    // Note: Not really a "copy". from OmDBDiffReportEntryCodec
+    return object;

Review Comment:
   I don't think we use `copyObject()` as such. I just followed other codec implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1091294016


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   Thanks George. This PR is more like a POC to validate the thoughts.  I will adjust RocksDB configuration either in next revision or follow up PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by GitBox <gi...@apache.org>.
sadanand48 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1081599774


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> renameDiffs =
+          createDiffReportPersistentMap(renameDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> createDiffs =
+          createDiffReportPersistentMap(createDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> modifyDiffs =
+          createDiffReportPersistentMap(modifyDiffColumnFamily);
+
+
+      long deleteCounter = 0L;
+      long renameCounter = 0L;
+      long createCounter = 0L;
+      long modifyCounter = 0L;
+
+      Iterator<Long> objectIdsIterator = objectIDsToCheck.getKeyIterator();
+      while (objectIdsIterator.hasNext()) {
+        Long id = objectIdsIterator.next();
+        /*
+         * This key can be
+         * -> Created after the old snapshot was taken, which means it will be
+         *    missing in oldKeyTable and present in newKeyTable.
+         * -> Deleted after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and missing in newKeyTable.
+         * -> Modified after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and present in newKeyTable with same
+         *    Object ID but with different metadata.
+         * -> Renamed after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and present in newKeyTable but with
+         *    different name and same Object ID.
+         */
+
+        final String oldKeyName = oldObjIdToKeyMap.get(id);
+        final String newKeyName = newObjIdToKeyMap.get(id);
+
+        if (oldKeyName == null && newKeyName == null) {
+          // This cannot happen.
+          throw new IllegalStateException("Old and new key name both are null");
+        } else if (oldKeyName == null) { // Key Created.
+          createDiffs.put(createCounter++,
+              DiffReportEntry.of(DiffType.CREATE, newKeyName));
+        } else if (newKeyName == null) { // Key Deleted.
+          deleteDiffs.put(deleteCounter++,
+              DiffReportEntry.of(DiffType.DELETE, oldKeyName));
+        } else if (oldKeyName.equals(newKeyName)) { // Key modified.
+          modifyDiffs.put(modifyCounter++,
+              DiffReportEntry.of(DiffType.MODIFY, newKeyName));
+        } else { // Key Renamed.
+          renameDiffs.put(renameCounter++,
+              DiffReportEntry.of(DiffType.RENAME, oldKeyName, newKeyName));
+        }
       }
 
-      // Key Created.
-      if (oldKeyName == null) {
-        createDiffs.add(DiffReportEntry.of(DiffType.CREATE, newKeyName));
-        continue;
+      return aggregateDiffReports(deleteDiffs,
+          renameDiffs,
+          createDiffs,
+          modifyDiffs);
+    } finally {
+      if (deleteDiffColumnFamily != null) {
+        rocksDB.dropColumnFamily(deleteDiffColumnFamily);
+        deleteDiffColumnFamily.close();
       }
-
-      // Key Deleted.
-      if (newKeyName == null) {
-        deleteDiffs.add(DiffReportEntry.of(DiffType.DELETE, oldKeyName));
-        continue;
+      if (renameDiffColumnFamily != null) {

Review Comment:
   One benefit I see of storing these diffs is if we don't drop them here and persist them with the from and to snapshot Info we can easily use it to store precomputed snapdiffs.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   Thanks @hemantk-12 for working on this. I had some questions
   1. The jira states that the objects would be persisted only if they are too large to fit in memory but I see here we are always persisting these objects, are we planning to always persist?
   2. For the diffs i.e create , delete etc I see we are persisting in Rocksdb and later populating all the entries inside a List so I don't think we are gaining much by persisting because ultimately we are populating it in memory ( in `aggregateDiffReports`).  
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by GitBox <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1081843020


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   For point 2: yes, we need to persist `aggregateDiffReports` or final report. I added a TODO which will be addressed in next revision.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> renameDiffs =
+          createDiffReportPersistentMap(renameDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> createDiffs =
+          createDiffReportPersistentMap(createDiffColumnFamily);
+      final PersistentMap<Long, DiffReportEntry> modifyDiffs =
+          createDiffReportPersistentMap(modifyDiffColumnFamily);
+
+
+      long deleteCounter = 0L;
+      long renameCounter = 0L;
+      long createCounter = 0L;
+      long modifyCounter = 0L;
+
+      Iterator<Long> objectIdsIterator = objectIDsToCheck.getKeyIterator();
+      while (objectIdsIterator.hasNext()) {
+        Long id = objectIdsIterator.next();
+        /*
+         * This key can be
+         * -> Created after the old snapshot was taken, which means it will be
+         *    missing in oldKeyTable and present in newKeyTable.
+         * -> Deleted after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and missing in newKeyTable.
+         * -> Modified after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and present in newKeyTable with same
+         *    Object ID but with different metadata.
+         * -> Renamed after the old snapshot was taken, which means it will be
+         *    present in oldKeyTable and present in newKeyTable but with
+         *    different name and same Object ID.
+         */
+
+        final String oldKeyName = oldObjIdToKeyMap.get(id);
+        final String newKeyName = newObjIdToKeyMap.get(id);
+
+        if (oldKeyName == null && newKeyName == null) {
+          // This cannot happen.
+          throw new IllegalStateException("Old and new key name both are null");
+        } else if (oldKeyName == null) { // Key Created.
+          createDiffs.put(createCounter++,
+              DiffReportEntry.of(DiffType.CREATE, newKeyName));
+        } else if (newKeyName == null) { // Key Deleted.
+          deleteDiffs.put(deleteCounter++,
+              DiffReportEntry.of(DiffType.DELETE, oldKeyName));
+        } else if (oldKeyName.equals(newKeyName)) { // Key modified.
+          modifyDiffs.put(modifyCounter++,
+              DiffReportEntry.of(DiffType.MODIFY, newKeyName));
+        } else { // Key Renamed.
+          renameDiffs.put(renameCounter++,
+              DiffReportEntry.of(DiffType.RENAME, oldKeyName, newKeyName));
+        }
       }
 
-      // Key Created.
-      if (oldKeyName == null) {
-        createDiffs.add(DiffReportEntry.of(DiffType.CREATE, newKeyName));
-        continue;
+      return aggregateDiffReports(deleteDiffs,
+          renameDiffs,
+          createDiffs,
+          modifyDiffs);
+    } finally {
+      if (deleteDiffColumnFamily != null) {
+        rocksDB.dropColumnFamily(deleteDiffColumnFamily);
+        deleteDiffColumnFamily.close();
       }
-
-      // Key Deleted.
-      if (newKeyName == null) {
-        deleteDiffs.add(DiffReportEntry.of(DiffType.DELETE, oldKeyName));
-        continue;
+      if (renameDiffColumnFamily != null) {

Review Comment:
   I think instead of persisting `deleteDiffs`, `renameDiffs`, `createDiffs` and `modifyDiffs`, we can persist `oldObjIdToKeyMap` and `newObjIdToKeyMap` because they are one used to generate `deleteDiffs`, `renameDiffs`, `createDiffs` and `modifyDiffs`. And it can be different for different snapshots.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GeorgeJahad commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "GeorgeJahad (via GitHub)" <gi...@apache.org>.
GeorgeJahad commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1089564263


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   > We will rely on RocksDB's mem-tables size for that as of now
   
   
   I like the idea of letting Rocksdb manage memory, but if we don't want a disk write on each db op, we should consider disabling the WAL:
   http://rocksdb.org/blog/2017/08/25/flushwal.html
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GeorgeJahad commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "GeorgeJahad (via GitHub)" <gi...@apache.org>.
GeorgeJahad commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1089564790


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -268,4 +280,29 @@ private void verifySnapshotInfoForSnapDiff(final SnapshotInfo fromSnapshot,
           " should be older than to toSnapshot:" + toSnapshot.getName());
     }
   }
+
+  private RocksDB createRocksDbForSnapshotDiff(OzoneConfiguration config) {

Review Comment:
   It seems like this new rocksdb should not be included when bootstraping a follower, is that right?
   



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -268,4 +280,29 @@ private void verifySnapshotInfoForSnapDiff(final SnapshotInfo fromSnapshot,
           " should be older than to toSnapshot:" + toSnapshot.getName());
     }
   }
+
+  private RocksDB createRocksDbForSnapshotDiff(OzoneConfiguration config) {

Review Comment:
   It seems like this new rocksdb should not be included when bootstrapping a follower, is that right?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1091294016


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   Thanks George. This PR is more like a PC to validate the thoughts.  We will adjust RocksDB configuration either in next revision or follow up PR. 



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   Thanks George. This PR is more like a PC to validate the thoughts.  I will adjust RocksDB configuration either in next revision or follow up PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by GitBox <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1081843020


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -271,61 +359,130 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
   }
 
   private List<DiffReportEntry> generateDiffReport(
-      final Set<Long> objectIDsToCheck,
-      final Map<Long, String> oldObjIdToKeyMap,
-      final Map<Long, String> newObjIdToKeyMap) {
-
-    final List<DiffReportEntry> deleteDiffs = new ArrayList<>();
-    final List<DiffReportEntry> renameDiffs = new ArrayList<>();
-    final List<DiffReportEntry> createDiffs = new ArrayList<>();
-    final List<DiffReportEntry> modifyDiffs = new ArrayList<>();
-
-
-    for (Long id : objectIDsToCheck) {
-      /*
-       * This key can be
-       * -> Created after the old snapshot was taken, which means it will be
-       *    missing in oldKeyTable and present in newKeyTable.
-       * -> Deleted after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and missing in newKeyTable.
-       * -> Modified after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable with same
-       *    Object ID but with different metadata.
-       * -> Renamed after the old snapshot was taken, which means it will be
-       *    present in oldKeyTable and present in newKeyTable but with different
-       *    name and same Object ID.
-       */
-
-      final String oldKeyName = oldObjIdToKeyMap.get(id);
-      final String newKeyName = newObjIdToKeyMap.get(id);
-
-      if (oldKeyName == null && newKeyName == null) {
-        // This cannot happen.
-        continue;
+      final String requestId,
+      final PersistentMap<Long, byte[]> objectIDsToCheck,
+      final PersistentMap<Long, String> oldObjIdToKeyMap,
+      final PersistentMap<Long, String> newObjIdToKeyMap
+  ) throws RocksDBException, IOException {
+
+    // RequestId is prepended to column family name to make it unique
+    // for request.
+    ColumnFamilyHandle deleteDiffColumnFamily = null;
+    ColumnFamilyHandle renameDiffColumnFamily = null;
+    ColumnFamilyHandle createDiffColumnFamily = null;
+    ColumnFamilyHandle modifyDiffColumnFamily = null;
+
+    try {
+      deleteDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-deleteDiff"),
+              new ColumnFamilyOptions()));
+      renameDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-renameDiff"),
+              new ColumnFamilyOptions()));
+      createDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-createDiff"),
+              new ColumnFamilyOptions()));
+      modifyDiffColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-modifyDiff"),
+              new ColumnFamilyOptions()));
+
+      final PersistentMap<Long, DiffReportEntry> deleteDiffs =
+          createDiffReportPersistentMap(deleteDiffColumnFamily);

Review Comment:
   1. We will rely on RocksDB's mem-tables size for that as of now and control it by mem-table sizes. Having customer configuration for on-head, off-heap and disk usage will increase the complexity. We don't have to go into that granularity for now. We can  come back to that later if needed.
   2. Yes, we need to persist `aggregateDiffReports` or final report. I added a TODO which will be addressed in next revision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] duongkame commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "duongkame (via GitHub)" <gi...@apache.org>.
duongkame commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1091303192


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -268,4 +280,29 @@ private void verifySnapshotInfoForSnapDiff(final SnapshotInfo fromSnapshot,
           " should be older than to toSnapshot:" + toSnapshot.getName());
     }
   }
+
+  private RocksDB createRocksDbForSnapshotDiff(OzoneConfiguration config) {
+    final Options options = new Options().setCreateIfMissing(true);
+
+    final File dbDirPath =
+        ServerUtils.getDBPath(config, OZONE_OM_SNAPSHOT_DIFF_DB_DIR);
+
+    String dbPath = Paths.get(dbDirPath.toString(), OM_SNAPSHOT_DIFF_DB_NAME)
+        .toFile()
+        .getAbsolutePath();
+
+    try {
+      return RocksDB.open(options, dbPath);

Review Comment:
   Please add this `open ` overload to ManagedRocksDB and used it. It'll help you assert if RocksDb objects are not closed properly



##########
hadoop-ozone/ozone-manager/pom.xml:
##########
@@ -271,6 +271,44 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
           <excludeFilterFile>${basedir}/dev-support/findbugsExcludeFile.xml</excludeFilterFile>
         </configuration>
       </plugin>
+      <plugin>
+        <artifactId>maven-enforcer-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>depcheck</id>
+            <phase></phase>
+          </execution>
+          <execution>
+            <id>banned-rocksdb-imports</id>
+            <phase>process-sources</phase>
+            <goals>
+              <goal>enforce</goal>
+            </goals>
+            <configuration>
+              <rules>
+                <RestrictImports>
+                  <includeTestCode>false</includeTestCode>
+                  <reason>Use managed RocksObjects under org.apache.hadoop.hdds.utils.db.managed instead.</reason>
+                  <!-- By default, ban all the classes in org.rocksdb -->
+                  <bannedImport>org.rocksdb.**</bannedImport>
+                  <allowedImports>
+                    <!-- Allow these imports for snapshot diff. We can't use managed DB for Snapshot diff because

Review Comment:
   Actually,  the `managed` package can always be extended to adapt to new needs. And it's important for this feature to use managed RocksObject to assert memory leak. 



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/RocksDbPersistentList.java:
##########
@@ -0,0 +1,139 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.om.snapshot;
+
+import java.io.IOException;
+import java.util.Iterator;
+import org.apache.hadoop.hdds.utils.db.CodecRegistry;
+import org.rocksdb.ColumnFamilyHandle;
+import org.rocksdb.RocksDB;

Review Comment:
   Please use `managed` objects. 



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -35,6 +38,8 @@
 import org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager;
 import org.apache.hadoop.ozone.snapshot.SnapshotDiffReport;
 import org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer;
+import org.rocksdb.Options;

Review Comment:
   As mentioned below, please extend the Managed objects and use them instead of using RocksObject directly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] prashantpogde commented on a diff in pull request #4190: HDDS-7800. Use persistent map to generate snapshot diff report

Posted by "prashantpogde (via GitHub)" <gi...@apache.org>.
prashantpogde commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1093606831


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -113,54 +133,151 @@ public SnapshotDiffReport getSnapshotDiffReport(final String volume,
     final BucketLayout bucketLayout = getBucketLayout(volume, bucket,
         fromSnapshot.getMetadataManager());
 
-    /*
-     * The reason for having ObjectID to KeyName mapping instead of OmKeyInfo
-     * is to reduce the memory footprint.
-     */
-    final Map<Long, String> oldObjIdToKeyMap = new HashMap<>();
-    // Long --> const. length
-    // String --> var. length "/dir1/dir2/dir3/dir4/dir5/key1"
-    final Map<Long, String> newObjIdToKeyMap = new HashMap<>();
-
-    final Set<Long> objectIDsToCheck = new HashSet<>();
-
-    // add to object ID map for key/file.
-
-    final Table<String, OmKeyInfo> fsKeyTable = fromSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Table<String, OmKeyInfo> tsKeyTable = toSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Set<String> deltaFilesForKeyOrFileTable =
-        getDeltaFiles(fromSnapshot, toSnapshot,
-            Collections.singletonList(fsKeyTable.getName()),
-            fsInfo, tsInfo, volume, bucket);
-
-    addToObjectIdMap(fsKeyTable, tsKeyTable, deltaFilesForKeyOrFileTable,
-        oldObjIdToKeyMap, newObjIdToKeyMap, objectIDsToCheck, false);
-
-    if (bucketLayout.isFileSystemOptimized()) {
-      // add to object ID map for directory.
-      final Table<String, OmDirectoryInfo> fsDirTable =
-          fromSnapshot.getMetadataManager().getDirectoryTable();
-      final Table<String, OmDirectoryInfo> tsDirTable =
-          toSnapshot.getMetadataManager().getDirectoryTable();
-      final Set<String> deltaFilesForDirTable =
+    // TODO: This should comes from request itself.
+    String requestId = UUID.randomUUID().toString();
+
+    ColumnFamilyHandle fromSnapshotColumnFamily = null;
+    ColumnFamilyHandle toSnapshotColumnFamily = null;
+    ColumnFamilyHandle objectIDsColumnFamily = null;
+    ColumnFamilyHandle diffReportColumnFamily = null;
+
+    try {
+      // RequestId is prepended to column family name to make it unique
+      // for request.
+      fromSnapshotColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-fromSnapshot"),
+              new ColumnFamilyOptions()));
+      toSnapshotColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-toSnapshot"),
+              new ColumnFamilyOptions()));
+      objectIDsColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-objectIDs"),
+              new ColumnFamilyOptions()));
+      diffReportColumnFamily = rocksDB.createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-diffReport"),
+              new ColumnFamilyOptions()));
+      /*
+       * The reason for having ObjectID to KeyName mapping instead of OmKeyInfo
+       * is to reduce the memory footprint.
+       */
+      final PersistentMap<Long, String> oldObjIdToKeyPersistentMap =
+          new RocksDbPersistentMap<>(rocksDB,
+              fromSnapshotColumnFamily,
+              codecRegistry,
+              Long.class,
+              String.class);
+
+      // Long --> const. length
+      // String --> var. length "/dir1/dir2/dir3/dir4/dir5/key1"

Review Comment:
   Having some comments here would help as to how these different maps are used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] prashantpogde commented on pull request #4190: HDDS-7800. Use persistent map and set to generate snapshot diff report

Posted by "prashantpogde (via GitHub)" <gi...@apache.org>.
prashantpogde commented on PR #4190:
URL: https://github.com/apache/ozone/pull/4190#issuecomment-1414299214

   @hemantk-12 let us rebase this PR to master branch now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4190:
URL: https://github.com/apache/ozone/pull/4190#discussion_r1102200036


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -113,54 +134,154 @@ public SnapshotDiffReport getSnapshotDiffReport(final String volume,
     final BucketLayout bucketLayout = getBucketLayout(volume, bucket,
         fromSnapshot.getMetadataManager());
 
-    /*
-     * The reason for having ObjectID to KeyName mapping instead of OmKeyInfo
-     * is to reduce the memory footprint.
-     */
-    final Map<Long, String> oldObjIdToKeyMap = new HashMap<>();
-    // Long --> const. length
-    // String --> var. length "/dir1/dir2/dir3/dir4/dir5/key1"
-    final Map<Long, String> newObjIdToKeyMap = new HashMap<>();
-
-    final Set<Long> objectIDsToCheck = new HashSet<>();
-
-    // add to object ID map for key/file.
-
-    final Table<String, OmKeyInfo> fsKeyTable = fromSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Table<String, OmKeyInfo> tsKeyTable = toSnapshot
-        .getMetadataManager().getKeyTable(bucketLayout);
-    final Set<String> deltaFilesForKeyOrFileTable =
-        getDeltaFiles(fromSnapshot, toSnapshot,
-            Collections.singletonList(fsKeyTable.getName()),
-            fsInfo, tsInfo, volume, bucket);
-
-    addToObjectIdMap(fsKeyTable, tsKeyTable, deltaFilesForKeyOrFileTable,
-        oldObjIdToKeyMap, newObjIdToKeyMap, objectIDsToCheck, false);
-
-    if (bucketLayout.isFileSystemOptimized()) {
-      // add to object ID map for directory.
-      final Table<String, OmDirectoryInfo> fsDirTable =
-          fromSnapshot.getMetadataManager().getDirectoryTable();
-      final Table<String, OmDirectoryInfo> tsDirTable =
-          toSnapshot.getMetadataManager().getDirectoryTable();
-      final Set<String> deltaFilesForDirTable =
+    // TODO: This should comes from request itself.
+    String requestId = UUID.randomUUID().toString();
+
+    ColumnFamilyHandle fromSnapshotColumnFamily = null;
+    ColumnFamilyHandle toSnapshotColumnFamily = null;
+    ColumnFamilyHandle objectIDsColumnFamily = null;
+    ColumnFamilyHandle diffReportColumnFamily = null;
+
+    try {
+      // RequestId is prepended to column family name to make it unique
+      // for request.
+      fromSnapshotColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-fromSnapshot"),
+              new ManagedColumnFamilyOptions()));
+      toSnapshotColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-toSnapshot"),
+              new ManagedColumnFamilyOptions()));
+      objectIDsColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-objectIDs"),
+              new ManagedColumnFamilyOptions()));
+      diffReportColumnFamily = db.get().createColumnFamily(
+          new ColumnFamilyDescriptor(
+              codecRegistry.asRawData(requestId + "-diffReport"),
+              new ManagedColumnFamilyOptions()));
+
+      // ObjectId to keyName map to keep key info for fromSnapshot.
+      // objectIdToKeyNameMap is used to identify what keys were touched
+      // in which snapshot and to know the difference if operation was
+      // creation, deletion, modify or rename.
+      // Stores only keyName instead of OmKeyInfo to reduce the memory
+      // footprint.
+      final PersistentMap<Long, String> objectIdToKeyNameMapForFromSnapshot =
+          new RocksDbPersistentMap<>(db,
+              fromSnapshotColumnFamily,
+              codecRegistry,
+              Long.class,
+              String.class);
+
+      // ObjectId to keyName map to keep key info for toSnapshot.
+      final PersistentMap<Long, String> objectIdToKeyNameMapForToSnapshot =
+          new RocksDbPersistentMap<>(db,
+              toSnapshotColumnFamily,
+              codecRegistry,
+              Long.class,
+              String.class);
+
+      // Set of unique objectId between fromSnapshot and toSnapshot.
+      final PersistentSet<Long> objectIDsToCheckMap =

Review Comment:
   Yes, if you look at the implementation of [RocksDbPersistentSet](https://github.com/apache/ozone/pull/4190/files#diff-3fd97e8db55b5b2c5840e959a0439a45e917f7e9c53afe5a60f6d5e6b7203b96R51). I'm using set entry as key for RocksDB and value is empty byte array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on pull request #4190: HDDS-7800. [Snapshot] Use persistent map and set to generate snapshot diff report

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on PR #4190:
URL: https://github.com/apache/ozone/pull/4190#issuecomment-1431709756

   Thanks @hemantk-12 for this important improvement to SnapDiff. Thanks @GeorgeJahad @prashantpogde @sadanand48 @DaveTeng0 @duongkame for the reviews and comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org