You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/06/29 08:30:34 UTC

[GitHub] [hadoop-ozone] ChenSammi opened a new pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

ChenSammi opened a new pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147


   https://issues.apache.org/jira/browse/HDDS-3892
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r456372422



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,30 +119,57 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    Lock containerLock = rocksDBLock.get(containerDBPath);
+    containerLock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
+      lock.lock();
+      try {
+        db = (ReferenceCountedDB) this.get(containerDBPath);
+        if (db != null) {
+          db.incrementReference();
+          return db;
+        }
+      } finally {
+        lock.unlock();
+      }
 
-      if (db == null) {
+      try {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      } catch (Exception e) {
+        LOG.error("Error opening DB. Container:{} ContainerPath:{}",
+            containerID, containerDBPath, e);
+        throw e;
+      }
+
+      lock.lock();
+      try {
+        ReferenceCountedDB currentDB =
+            (ReferenceCountedDB) this.get(containerDBPath);
+        if (currentDB != null) {
+          // increment the reference before returning the object
+          currentDB.incrementReference();
+          // clean the db created in previous step
+          db.cleanup();

Review comment:
       This cleanup() call may be expensive, and ideally it should be outside the lock. However, I think it will be very rare where you hit this scenario and therefore I think the logic is OK as it is now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

adoroszlai commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r450182959



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,28 +117,49 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
+    ReferenceCountedDB db;
     lock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
-      if (db == null) {
-        MetadataStore metadataStore =
-            MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
-        db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      db = (ReferenceCountedDB) this.get(containerDBPath);
+      if (db != null) {
+        db.incrementReference();
+        return db;
       }
-      // increment the reference before returning the object
-      db.incrementReference();
-      return db;
+    } finally {
+      lock.unlock();
+    }
+
+    try {
+      MetadataStore metadataStore =
+          MetadataStoreBuilder.newBuilder()
+              .setDbFile(new File(containerDBPath))
+              .setCreateIfMissing(false)
+              .setConf(conf)
+              .setDBType(containerDBType)
+              .build();

Review comment:
       > > I see one issue with this approach.
   > > If the database is already opened, and if we try to open again we will get this error.
   > > I think, with this change, we will throw an exception if we try to open the database again an already existing one.
   > > java.io.IOException: Failed init RocksDB, db path : /Users/bviswanadham/workspace/hadoop-ozone/hadoop-hdds/container-service/target/test-dir/xCkBnsLVrc/cont1, exception :org.rocksdb.RocksDBException lock : /Users/bviswanadham/workspace/hadoop-ozone/hadoop-hdds/container-service/target/test-dir/xCkBnsLVrc/cont1/LOCK: No locks available
   > 
   > @bharatviswa504 , I get your point. It's an issue, but not introduced by this patch. It's a currenlty existing issue and we need to carefully think about how to fix it with a new JIRA.
   
   @ChenSammi can you please clarify why you think it's an existing issue?  It seems to me that calling `build()` outside of the lock introduces this problem.
   
   If I understand correctly, the performance issue is due to using a single lock for all containers.  Instead of moving the expensive part outside of the lock, how about using a separate lock per `containerDBPath`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-662263104


   Thanks @bharatviswa504 @sodonnel and @adoroszlai for the code review and great suggestions. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-651514206


   @bharatviswa504  we have HDDS-3217 deployed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-652176025


   > 
   > 
   > I see one issue with this approach.
   > If the database is already opened, and if we try to open again we will get this error.
   > 
   > I think, with this change, we will throw an exception if we try to open the database again an already existing one.
   > 
   > java.io.IOException: Failed init RocksDB, db path : /Users/bviswanadham/workspace/hadoop-ozone/hadoop-hdds/container-service/target/test-dir/xCkBnsLVrc/cont1, exception :org.rocksdb.RocksDBException lock : /Users/bviswanadham/workspace/hadoop-ozone/hadoop-hdds/container-service/target/test-dir/xCkBnsLVrc/cont1/LOCK: No locks available
   
   @bharatviswa504 , I get your point. It's an issue, but not introduced by this patch.  It's a currenlty existing issue and we need to carefully think about how to fix it with a new JIRA. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-653425644


   The failed integration test is irrevelant. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r455480045



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -113,22 +113,41 @@ protected boolean removeLRU(LinkEntry entry) {
    * @return ReferenceCountedDB.
    */
   public ReferenceCountedDB getDB(long containerID, String containerDBType,
-                             String containerDBPath, ConfigurationSource conf)
+      String containerDBPath, ConfigurationSource conf)
+      throws IOException {
+    return getDB(containerID, containerDBType, containerDBPath, conf, true);
+  }
+  /**
+   * Returns a DB handle if available, create the handler otherwise.
+   *
+   * @param containerID - ID of the container.
+   * @param containerDBType - DB type of the container.
+   * @param containerDBPath - DB path of the container.
+   * @param acquireLock - false only for one-time ContainerReader execution
+   *                    during datanode initialization. Don't set it to false
+   *                    in other cases.
+   * @param conf - Hadoop Configuration.
+   * @return ReferenceCountedDB.
+   */
+  public ReferenceCountedDB getDB(long containerID, String containerDBType,
+      String containerDBPath, ConfigurationSource conf, boolean acquireLock)
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    if (acquireLock) {
+      lock.lock();
+    }
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
+      db = (ReferenceCountedDB) this.get(containerDBPath);
       if (db == null) {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
         this.put(containerDBPath, db);

Review comment:
       Thanks @sodonnel , I will take a look at the StrippedLock. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r446959699



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,28 +117,47 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
+    ReferenceCountedDB db;
     lock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
-      if (db == null) {
-        MetadataStore metadataStore =
-            MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
-        db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      db = (ReferenceCountedDB) this.get(containerDBPath);
+      if (db != null) {
+        db.incrementReference();
+        return db;
       }
-      // increment the reference before returning the object
-      db.incrementReference();
-      return db;
+    } finally {
+      lock.unlock();
+    }
+
+    try {
+      MetadataStore metadataStore =
+          MetadataStoreBuilder.newBuilder()
+              .setDbFile(new File(containerDBPath))
+              .setCreateIfMissing(false)
+              .setConf(conf)
+              .setDBType(containerDBType)
+              .build();
+      db = new ReferenceCountedDB(metadataStore, containerDBPath);
     } catch (Exception e) {
       LOG.error("Error opening DB. Container:{} ContainerPath:{}",
           containerID, containerDBPath, e);
       throw e;
+    }
+
+    lock.lock();
+    try {
+      ReferenceCountedDB currentDB =
+          (ReferenceCountedDB) this.get(containerDBPath);
+      if (currentDB != null) {
+        // increment the reference before returning the object
+        currentDB.incrementReference();
+        return currentDB;

Review comment:
       Thanks for the review. You're right.  The new DB instance should be explicitely closed in this case.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r454947889



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -113,22 +113,41 @@ protected boolean removeLRU(LinkEntry entry) {
    * @return ReferenceCountedDB.
    */
   public ReferenceCountedDB getDB(long containerID, String containerDBType,
-                             String containerDBPath, ConfigurationSource conf)
+      String containerDBPath, ConfigurationSource conf)
+      throws IOException {
+    return getDB(containerID, containerDBType, containerDBPath, conf, true);
+  }
+  /**
+   * Returns a DB handle if available, create the handler otherwise.
+   *
+   * @param containerID - ID of the container.
+   * @param containerDBType - DB type of the container.
+   * @param containerDBPath - DB path of the container.
+   * @param acquireLock - false only for one-time ContainerReader execution
+   *                    during datanode initialization. Don't set it to false
+   *                    in other cases.
+   * @param conf - Hadoop Configuration.
+   * @return ReferenceCountedDB.
+   */
+  public ReferenceCountedDB getDB(long containerID, String containerDBType,
+      String containerDBPath, ConfigurationSource conf, boolean acquireLock)
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    if (acquireLock) {
+      lock.lock();
+    }
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
+      db = (ReferenceCountedDB) this.get(containerDBPath);
       if (db == null) {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
         this.put(containerDBPath, db);

Review comment:
       If I understand correctly, you have created a boolean to either lock or not lock in this method. In the DN initialization logic, you pass false here to avoid locking, as the same container will not be initialized by multiple threads at once.
   
   I am still not sure this is safe, specifically the line:
   
   ```
   this.put(containerDBPath, db);
   ```
   
   ContainerCache is an instance of LRUCache and the JavaDoc says:
   
   > Note that LRUMap is not synchronized and is not thread-safe.</strong>
   > If you wish to use this map from multiple threads concurrently, you must use
   > appropriate synchronization.
   
   Therefore I think you must lock around access and writes to the LRU Cache.
   
   I also worry about removing locking completely incase someone passes "false" mistakenly to this method in the future.
   
   @adoroszlai Earlier suggested using a lock per container path. We might be able to use a Guava Striped lock here with something like 256 buckets.
   
   Could we do something like:
   
   ```
     public ReferenceCountedDB getDB(long containerID, String containerDBType,
                                String containerDBPath, ConfigurationSource conf)
         throws IOException {
       Preconditions.checkState(containerID >= 0,
           "Container ID cannot be negative.");
       ReferenceCountedDB db = null;
       Lock slock = stripedLock.get(containerDBPath);
       slock.lock();
       try {
         lock.lock();
         try {
           db = (ReferenceCountedDB) this.get(containerDBPath);
         } finally {
           lock.unlock();
         }
         if (db == null) {
           MetadataStore metadataStore =
               MetadataStoreBuilder.newBuilder()
                   .setDbFile(new File(containerDBPath))
                   .setCreateIfMissing(false)
                   .setConf(conf)
                   .setDBType(containerDBType)
                   .build();
           db = new ReferenceCountedDB(metadataStore, containerDBPath);
           addDB(containerDBPath, db);
         }
         db.incrementReference();
         return db;
       } catch (Exception e) {
         LOG.error("Error opening DB. Container:{} ContainerPath:{}",
             containerID, containerDBPath, e);
         throw e;
       } finally {
         slock.unlock();
       }
     }
   ```
   
   We still have the global lock to protect the ContainerCache structure, but we now have a stripped lock to ensure that the same ContainerPath cannot be initialized at the same time.
   
   I am not sure if we need to worry about another thread calling addDB and adding another instances of the same DB to the cache. Perhaps not a DN initialization time, but at runtime. If that is the case, we would need to add back the logic you had before, where you take the global lock again, and then test to see if an entry for the DB has been added to the cache, and if so close the new instance and return the cached one.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

bharatviswa504 commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-651480464


   Hi @ChenSammi 
   Thanks for the PR.
   I too have the same comment as @sodonnel. I think we need to close the DB instance if we get an instance from the cache.
   
   And also do you have this version on your deployed cluster
   https://github.com/apache/hadoop-ozone/commit/a2ab8d6e35f60af9762a191265942071755329be
   
   This saves in iterating Db to compute block metadata, and get all the required metadata from db.get(). We have observed this issue during a billion object test.
   For older containers created before HDDS-3217, we still iterate and set block metadata. (But this fixed a bug, where we used to iterate thrice for open containers)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r456398538



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,30 +119,57 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    Lock containerLock = rocksDBLock.get(containerDBPath);
+    containerLock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
+      lock.lock();
+      try {
+        db = (ReferenceCountedDB) this.get(containerDBPath);
+        if (db != null) {
+          db.incrementReference();
+          return db;
+        }
+      } finally {
+        lock.unlock();
+      }
 
-      if (db == null) {
+      try {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      } catch (Exception e) {
+        LOG.error("Error opening DB. Container:{} ContainerPath:{}",
+            containerID, containerDBPath, e);
+        throw e;
+      }
+
+      lock.lock();
+      try {
+        ReferenceCountedDB currentDB =
+            (ReferenceCountedDB) this.get(containerDBPath);
+        if (currentDB != null) {
+          // increment the reference before returning the object
+          currentDB.incrementReference();
+          // clean the db created in previous step
+          db.cleanup();

Review comment:
       OK.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi merged pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi merged pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-659112467


   > > Thanks @bharatviswa504 and @adoroszlai for the review. lock per containerDBPath in ContainerCache is a good idea. I tried it but cannot solve our problem given a volume has nearly 20K containers, it still take nearly an hour for the datanode to finish the initialization.
   > 
   > With a lock per containerPath, where does the bottle neck come from? Is it around the lock, or does a single thread load all the containers for a volume, and therefore its the time taken to init all the DB instances that is the bottleneck?
   
   The DB open takes time. The LOCK makes the DB open action one by one. Once it has more than 10K rocksdb in a volume, the initialization takes a considerable time, say more than 30minutes.   
   There are several severe consequence of datanode slow initialization,
   1. service unavailable
   2. scm will treat datanode as DEAD(We have raised the DEAD threshold to 30m in our cluster), and close all the related pipeline and containers related with this datanode, which contributed to why there are so many small containers in our cluster. It's a negative cycle.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-658687002


   > Thanks @bharatviswa504 and @adoroszlai for the review. lock per containerDBPath in ContainerCache is a good idea. I tried it but cannot solve our problem given a volume has nearly 20K containers, it still take nearly an hour for the datanode to finish the initialization.
   
   With a lock per containerPath, where does the bottle neck come from? Is it around the lock, or does a single thread load all the containers for a volume, and therefore its the time taken to init all the DB instances that is the bottleneck?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r451408251



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,28 +117,49 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
+    ReferenceCountedDB db;
     lock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
-      if (db == null) {
-        MetadataStore metadataStore =
-            MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
-        db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      db = (ReferenceCountedDB) this.get(containerDBPath);
+      if (db != null) {
+        db.incrementReference();
+        return db;
       }
-      // increment the reference before returning the object
-      db.incrementReference();
-      return db;
+    } finally {
+      lock.unlock();
+    }
+
+    try {
+      MetadataStore metadataStore =
+          MetadataStoreBuilder.newBuilder()
+              .setDbFile(new File(containerDBPath))
+              .setCreateIfMissing(false)
+              .setConf(conf)
+              .setDBType(containerDBType)
+              .build();

Review comment:
       @adoroszlai, OK, I get the point now.  During the datanode initialization phase, actually there is no change a DB will be opened concurrently.  But since this function is also used in late phase, there is possibility to trigger the exception.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-656074756


   Thanks @bharatviswa504  and @adoroszlai for the review. lock per containerDBPath in ContainerCache is a good idea. I tried it but cannot solve our problem given a volume has nearly 20K containers, it still take nearly an hour for the datanode to finish the initialization. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-660072042


   > 
   > 
   > These changes with the striped lock LGTM, +1.
   > 
   > One question - have you been able to test this change on your DNs with a large number of containers, and did it give a good improvement?
   
   Yes, I have tested it on a datanode with 20K+ containers, it costs about 30m to finish the ContainerSet build.  Before the patch,  time spending is about 10 times longer.  If we ignore the lock during ContainerSet build,  it costs about 15m. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

ChenSammi edited a comment on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-660072042


   > 
   > 
   > These changes with the striped lock LGTM, +1.
   > 
   > One question - have you been able to test this change on your DNs with a large number of containers, and did it give a good improvement?
   
   Thanks @sodonnel . 
   Yes, I have tested it on a datanode with 20K+ containers, it costs about 30m to finish the ContainerSet build.  Before the patch,  time spending is about 10 times longer.  If we ignore the lock during ContainerSet build,  it costs about 15m. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-660110616


   > > These changes with the striped lock LGTM, +1.
   > > One question - have you been able to test this change on your DNs with a large number of containers, and did it give a good improvement?
   > 
   > Thanks @sodonnel .
   > Yes, I have tested it on a datanode with 20K+ containers, it costs about 30m to finish the ContainerSet build. Before the patch, time spending is about 10 times longer. If we ignore the lock during ContainerSet build, it costs about 15m.
   
   This seems like a good improvement for a relatively simply change. I think removing the lock completely is risky and may cause some other problems, either now or later.
   
   @bharatviswa504 @adoroszlai  - have you guys any further comments or concerns with the latest patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

adoroszlai commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r456456312



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -43,6 +44,7 @@
   private final Lock lock = new ReentrantLock();
   private static ContainerCache cache;
   private static final float LOAD_FACTOR = 0.75f;
+  private final Striped<Lock> rocksDBLock = Striped.lazyWeakLock(1024);

Review comment:
       Would be nice to make the number of stripes configurable (later).

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestContainerCache.java
##########
@@ -63,6 +70,8 @@ public void testContainerCacheEviction() throws Exception {
     conf.setInt(OzoneConfigKeys.OZONE_CONTAINER_CACHE_SIZE, 2);
 
     ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+    Assert.assertTrue(cache.size() == 0);

Review comment:
       ```suggestion
       Assert.assertEquals(0, cache.size());
   ```

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestContainerCache.java
##########
@@ -123,4 +132,47 @@ public void testContainerCacheEviction() throws Exception {
     thrown.expect(IllegalArgumentException.class);
     db5.close();
   }
+
+  @Test
+  public void testConcurrentDBGet() throws Exception {
+    File root = new File(testRoot);
+    root.mkdirs();
+    root.deleteOnExit();
+
+    OzoneConfiguration conf = new OzoneConfiguration();
+    conf.setInt(OzoneConfigKeys.OZONE_CONTAINER_CACHE_SIZE, 2);
+    ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+    Assert.assertTrue(cache.size() == 0);

Review comment:
       ```suggestion
       Assert.assertEquals(0, cache.size());
   ```

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestContainerCache.java
##########
@@ -123,4 +132,47 @@ public void testContainerCacheEviction() throws Exception {
     thrown.expect(IllegalArgumentException.class);
     db5.close();
   }
+
+  @Test
+  public void testConcurrentDBGet() throws Exception {
+    File root = new File(testRoot);
+    root.mkdirs();
+    root.deleteOnExit();
+
+    OzoneConfiguration conf = new OzoneConfiguration();
+    conf.setInt(OzoneConfigKeys.OZONE_CONTAINER_CACHE_SIZE, 2);
+    ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+    Assert.assertTrue(cache.size() == 0);
+    File containerDir = new File(root, "cont1");
+    createContainerDB(conf, containerDir);
+    ExecutorService executorService = Executors.newFixedThreadPool(2);
+    Runnable task = () -> {
+      try {
+        ReferenceCountedDB db1 = cache.getDB(1, "RocksDB",
+            containerDir.getPath(), conf);
+        Assert.assertTrue(db1 != null);
+      } catch (IOException e) {
+        Assert.fail("Should get the DB instance");
+      }
+    };
+    List<Future> futureList = new ArrayList<>();
+    futureList.add(executorService.submit(task));
+    futureList.add(executorService.submit(task));
+    for (Future future: futureList) {
+      try {
+        future.get();
+      } catch (InterruptedException| ExecutionException e) {
+        Assert.fail("Should get the DB instance");
+      }
+    }
+
+    ReferenceCountedDB db = cache.getDB(1, "RocksDB",
+        containerDir.getPath(), conf);
+    db.close();
+    db.close();
+    db.close();
+    Assert.assertTrue(cache.size() == 1);

Review comment:
       ```suggestion
       Assert.assertEquals(1, cache.size());
   ```

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestContainerReader.java
##########
@@ -219,4 +220,68 @@ public void testContainerReader() throws Exception {
           keyValueContainerData.getNumPendingDeletionBlocks());
     }
   }
+
+  @Test
+  public void testMultipleContainerReader() throws Exception {
+    final int volumeNum = 10;
+    StringBuffer datanodeDirs = new StringBuffer();
+    File[] volumeDirs = new File[volumeNum];
+    for (int i = 0; i < volumeNum; i++) {
+      volumeDirs[i] = tempDir.newFolder();
+      datanodeDirs = datanodeDirs.append(volumeDirs[i]).append(",");
+    }
+    conf.set(ScmConfigKeys.HDDS_DATANODE_DIR_KEY,
+        datanodeDirs.toString());
+    MutableVolumeSet volumeSets =
+        new MutableVolumeSet(datanodeId.toString(), conf);
+    ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+
+    RoundRobinVolumeChoosingPolicy policy =
+        new RoundRobinVolumeChoosingPolicy();
+
+    final int containerCount = 100;
+    blockCount = containerCount;
+    for (int i = 0; i < containerCount; i++) {
+      KeyValueContainerData keyValueContainerData =
+          new KeyValueContainerData(i, ChunkLayOutVersion.FILE_PER_BLOCK,
+              (long) StorageUnit.GB.toBytes(5), UUID.randomUUID().toString(),
+              datanodeId.toString());
+
+      KeyValueContainer keyValueContainer =
+          new KeyValueContainer(keyValueContainerData,
+              conf);
+      keyValueContainer.create(volumeSets, policy, scmId);
+
+      List<Long> blkNames;
+      if (i % 2 == 0) {
+        blkNames = addBlocks(keyValueContainer, true);
+        markBlocksForDelete(keyValueContainer, true, blkNames, i);
+      } else {
+        blkNames = addBlocks(keyValueContainer, false);
+        markBlocksForDelete(keyValueContainer, false, blkNames, i);
+      }
+    }
+
+    List<HddsVolume> hddsVolumes = volumeSets.getVolumesList();
+    ContainerReader[] containerReaders = new ContainerReader[volumeNum];
+    Thread[] threads = new Thread[volumeNum];
+    for (int i = 0; i < volumeNum; i++) {
+      containerReaders[i] = new ContainerReader(volumeSets,
+          hddsVolumes.get(i), containerSet, conf);
+      threads[i] = new Thread(containerReaders[i]);
+    }
+    long startTime = System.currentTimeMillis();
+    for (int i = 0; i < volumeNum; i++) {
+      threads[i].start();
+    }
+    for (int i = 0; i < volumeNum; i++) {
+      threads[i].join();
+    }
+    System.out.println("Open " + volumeNum + " Volume with " + containerCount +
+        " costs " + (System.currentTimeMillis() - startTime) / 1000 + "s");
+    Assert.assertTrue(
+        containerSet.getContainerMap().entrySet().size() == containerCount);

Review comment:
       ```suggestion
       Assert.assertEquals(containerCount,
           containerSet.getContainerMap().entrySet().size());
   ```

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestContainerReader.java
##########
@@ -219,4 +220,68 @@ public void testContainerReader() throws Exception {
           keyValueContainerData.getNumPendingDeletionBlocks());
     }
   }
+
+  @Test
+  public void testMultipleContainerReader() throws Exception {
+    final int volumeNum = 10;
+    StringBuffer datanodeDirs = new StringBuffer();
+    File[] volumeDirs = new File[volumeNum];
+    for (int i = 0; i < volumeNum; i++) {
+      volumeDirs[i] = tempDir.newFolder();
+      datanodeDirs = datanodeDirs.append(volumeDirs[i]).append(",");
+    }
+    conf.set(ScmConfigKeys.HDDS_DATANODE_DIR_KEY,
+        datanodeDirs.toString());
+    MutableVolumeSet volumeSets =
+        new MutableVolumeSet(datanodeId.toString(), conf);
+    ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+
+    RoundRobinVolumeChoosingPolicy policy =
+        new RoundRobinVolumeChoosingPolicy();
+
+    final int containerCount = 100;
+    blockCount = containerCount;
+    for (int i = 0; i < containerCount; i++) {
+      KeyValueContainerData keyValueContainerData =
+          new KeyValueContainerData(i, ChunkLayOutVersion.FILE_PER_BLOCK,
+              (long) StorageUnit.GB.toBytes(5), UUID.randomUUID().toString(),
+              datanodeId.toString());
+
+      KeyValueContainer keyValueContainer =
+          new KeyValueContainer(keyValueContainerData,
+              conf);
+      keyValueContainer.create(volumeSets, policy, scmId);
+
+      List<Long> blkNames;
+      if (i % 2 == 0) {
+        blkNames = addBlocks(keyValueContainer, true);
+        markBlocksForDelete(keyValueContainer, true, blkNames, i);
+      } else {
+        blkNames = addBlocks(keyValueContainer, false);
+        markBlocksForDelete(keyValueContainer, false, blkNames, i);
+      }
+    }
+
+    List<HddsVolume> hddsVolumes = volumeSets.getVolumesList();
+    ContainerReader[] containerReaders = new ContainerReader[volumeNum];
+    Thread[] threads = new Thread[volumeNum];
+    for (int i = 0; i < volumeNum; i++) {
+      containerReaders[i] = new ContainerReader(volumeSets,
+          hddsVolumes.get(i), containerSet, conf);
+      threads[i] = new Thread(containerReaders[i]);
+    }
+    long startTime = System.currentTimeMillis();
+    for (int i = 0; i < volumeNum; i++) {
+      threads[i].start();
+    }
+    for (int i = 0; i < volumeNum; i++) {
+      threads[i].join();
+    }
+    System.out.println("Open " + volumeNum + " Volume with " + containerCount +
+        " costs " + (System.currentTimeMillis() - startTime) / 1000 + "s");
+    Assert.assertTrue(
+        containerSet.getContainerMap().entrySet().size() == containerCount);
+    Assert.assertTrue(cache.size() == containerCount);

Review comment:
       ```suggestion
       Assert.assertEquals(containerCount, cache.size());
   ```

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestContainerCache.java
##########
@@ -123,4 +132,47 @@ public void testContainerCacheEviction() throws Exception {
     thrown.expect(IllegalArgumentException.class);
     db5.close();
   }
+
+  @Test
+  public void testConcurrentDBGet() throws Exception {
+    File root = new File(testRoot);
+    root.mkdirs();
+    root.deleteOnExit();
+
+    OzoneConfiguration conf = new OzoneConfiguration();
+    conf.setInt(OzoneConfigKeys.OZONE_CONTAINER_CACHE_SIZE, 2);
+    ContainerCache cache = ContainerCache.getInstance(conf);
+    cache.clear();
+    Assert.assertTrue(cache.size() == 0);
+    File containerDir = new File(root, "cont1");
+    createContainerDB(conf, containerDir);
+    ExecutorService executorService = Executors.newFixedThreadPool(2);
+    Runnable task = () -> {
+      try {
+        ReferenceCountedDB db1 = cache.getDB(1, "RocksDB",
+            containerDir.getPath(), conf);
+        Assert.assertTrue(db1 != null);

Review comment:
       ```suggestion
           Assert.assertNotNull(db1);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r454947889



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -113,22 +113,41 @@ protected boolean removeLRU(LinkEntry entry) {
    * @return ReferenceCountedDB.
    */
   public ReferenceCountedDB getDB(long containerID, String containerDBType,
-                             String containerDBPath, ConfigurationSource conf)
+      String containerDBPath, ConfigurationSource conf)
+      throws IOException {
+    return getDB(containerID, containerDBType, containerDBPath, conf, true);
+  }
+  /**
+   * Returns a DB handle if available, create the handler otherwise.
+   *
+   * @param containerID - ID of the container.
+   * @param containerDBType - DB type of the container.
+   * @param containerDBPath - DB path of the container.
+   * @param acquireLock - false only for one-time ContainerReader execution
+   *                    during datanode initialization. Don't set it to false
+   *                    in other cases.
+   * @param conf - Hadoop Configuration.
+   * @return ReferenceCountedDB.
+   */
+  public ReferenceCountedDB getDB(long containerID, String containerDBType,
+      String containerDBPath, ConfigurationSource conf, boolean acquireLock)
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    if (acquireLock) {
+      lock.lock();
+    }
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
+      db = (ReferenceCountedDB) this.get(containerDBPath);
       if (db == null) {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
         this.put(containerDBPath, db);

Review comment:
       If I understand correctly, you have created a boolean to either lock or not lock in this method. In the DN initialization logic, you pass false here to avoid locking, as the same container will not be initialized by multiple threads at once.
   
   I am still not sure this is safe, specifically the line:
   
   ```
   this.put(containerDBPath, db);
   ```
   
   ContainerCache is an instance of LRUCache and the JavaDoc says:
   
   > Note that LRUMap is not synchronized and is not thread-safe.</strong>
   > If you wish to use this map from multiple threads concurrently, you must use
   > appropriate synchronization.
   
   Therefore I think you must lock around access and writes to the LRU Cache.
   
   I also worry about removing locking completely incase someone passes "false" mistakenly to this method in the future.
   
   @adoroszlai Earlier suggested using a lock per container path. We might be able to use a Guava Striped lock here with something like 256 buckets.
   
   Could we do something like:
   
   ```
     public ReferenceCountedDB getDB(long containerID, String containerDBType,
                                String containerDBPath, ConfigurationSource conf)
         throws IOException {
       Preconditions.checkState(containerID >= 0,
           "Container ID cannot be negative.");
       ReferenceCountedDB db = null;
       Lock slock = stripedLock.get(containerDBPath);
       slock.lock();
       try {
         lock.lock();
         try {
           db = (ReferenceCountedDB) this.get(containerDBPath);
         } finally {
           lock.unlock();
         }
         if (db == null) {
           MetadataStore metadataStore =
               MetadataStoreBuilder.newBuilder()
                   .setDbFile(new File(containerDBPath))
                   .setCreateIfMissing(false)
                   .setConf(conf)
                   .setDBType(containerDBType)
                   .build();
           db = new ReferenceCountedDB(metadataStore, containerDBPath);
           addDB(containerDBPath, db);
         }
         db.incrementReference();
         return db;
       } catch (Exception e) {
         LOG.error("Error opening DB. Container:{} ContainerPath:{}",
             containerID, containerDBPath, e);
         throw e;
       } finally {
         slock.unlock();
       }
     }
   ```
   
   We still have the global lock to protect the ContainerCache structure, but we now have a stripped lock to ensure that the same ContainerPath cannot be initialized at the same time.
   
   I am not sure if we need to worry about another thread calling addDB - if that is the case, we would need to add back the logic you had before, where you take the global lock again, and then test to see if an entry for the DB has been added to the cache, and if so close the new instance.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r454957121



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -113,22 +113,41 @@ protected boolean removeLRU(LinkEntry entry) {
    * @return ReferenceCountedDB.
    */
   public ReferenceCountedDB getDB(long containerID, String containerDBType,
-                             String containerDBPath, ConfigurationSource conf)
+      String containerDBPath, ConfigurationSource conf)
+      throws IOException {
+    return getDB(containerID, containerDBType, containerDBPath, conf, true);
+  }
+  /**
+   * Returns a DB handle if available, create the handler otherwise.
+   *
+   * @param containerID - ID of the container.
+   * @param containerDBType - DB type of the container.
+   * @param containerDBPath - DB path of the container.
+   * @param acquireLock - false only for one-time ContainerReader execution
+   *                    during datanode initialization. Don't set it to false
+   *                    in other cases.
+   * @param conf - Hadoop Configuration.
+   * @return ReferenceCountedDB.
+   */
+  public ReferenceCountedDB getDB(long containerID, String containerDBType,
+      String containerDBPath, ConfigurationSource conf, boolean acquireLock)
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    if (acquireLock) {
+      lock.lock();
+    }
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
+      db = (ReferenceCountedDB) this.get(containerDBPath);
       if (db == null) {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
         this.put(containerDBPath, db);

Review comment:
       Thinking about this some more, it might be safest to wrap addDB and removeDB with the strippedLock too, as that way we ensure only one DB instance can be initialized and added to the map at any given time.
   
   @adoroszlai might have some better idea than me on this too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r446880416



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -117,28 +117,47 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
+    ReferenceCountedDB db;
     lock.lock();
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
-      if (db == null) {
-        MetadataStore metadataStore =
-            MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
-        db = new ReferenceCountedDB(metadataStore, containerDBPath);
-        this.put(containerDBPath, db);
+      db = (ReferenceCountedDB) this.get(containerDBPath);
+      if (db != null) {
+        db.incrementReference();
+        return db;
       }
-      // increment the reference before returning the object
-      db.incrementReference();
-      return db;
+    } finally {
+      lock.unlock();
+    }
+
+    try {
+      MetadataStore metadataStore =
+          MetadataStoreBuilder.newBuilder()
+              .setDbFile(new File(containerDBPath))
+              .setCreateIfMissing(false)
+              .setConf(conf)
+              .setDBType(containerDBType)
+              .build();
+      db = new ReferenceCountedDB(metadataStore, containerDBPath);
     } catch (Exception e) {
       LOG.error("Error opening DB. Container:{} ContainerPath:{}",
           containerID, containerDBPath, e);
       throw e;
+    }
+
+    lock.lock();
+    try {
+      ReferenceCountedDB currentDB =
+          (ReferenceCountedDB) this.get(containerDBPath);
+      if (currentDB != null) {
+        // increment the reference before returning the object
+        currentDB.incrementReference();
+        return currentDB;

Review comment:
       What happens to the ReferenceCountedDB instance created at line 140 if we return here?
   
   Do we need to call db.cleanup() on the newly created instance, and if we don't could this potentially leak db objects? I note that the remove() call in ContainerCache calls db.cleanup when removing the entry, which makes we suspect this is important.
   
   If we do need to call cleanup() on it, we can probably capture the return value inside the lock and then call cleanup() from outside the lock and then return?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

Posted by GitBox <gi...@apache.org>.

adoroszlai commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r457125905



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
##########
@@ -104,6 +104,9 @@
   public static final String OZONE_CONTAINER_CACHE_SIZE =
       "ozone.container.cache.size";
   public static final int OZONE_CONTAINER_CACHE_DEFAULT = 1024;
+  public static final String OZONE_CONTAINER_CACHE_LOCK_STRIPS =
+      "ozone.container.cache.lock.strips";
+  public static final int OZONE_CONTAINER_CACHE_LOCK_STRIPS_DEFAULT = 1024;

Review comment:
       Note minor typo: `strips` -> `stripes`, but I'm fine with both.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org