You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/07/15 10:21:33 UTC

[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

sodonnel commented on a change in pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#discussion_r454947889



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
##########
@@ -113,22 +113,41 @@ protected boolean removeLRU(LinkEntry entry) {
    * @return ReferenceCountedDB.
    */
   public ReferenceCountedDB getDB(long containerID, String containerDBType,
-                             String containerDBPath, ConfigurationSource conf)
+      String containerDBPath, ConfigurationSource conf)
+      throws IOException {
+    return getDB(containerID, containerDBType, containerDBPath, conf, true);
+  }
+  /**
+   * Returns a DB handle if available, create the handler otherwise.
+   *
+   * @param containerID - ID of the container.
+   * @param containerDBType - DB type of the container.
+   * @param containerDBPath - DB path of the container.
+   * @param acquireLock - false only for one-time ContainerReader execution
+   *                    during datanode initialization. Don't set it to false
+   *                    in other cases.
+   * @param conf - Hadoop Configuration.
+   * @return ReferenceCountedDB.
+   */
+  public ReferenceCountedDB getDB(long containerID, String containerDBType,
+      String containerDBPath, ConfigurationSource conf, boolean acquireLock)
       throws IOException {
     Preconditions.checkState(containerID >= 0,
         "Container ID cannot be negative.");
-    lock.lock();
+    ReferenceCountedDB db;
+    if (acquireLock) {
+      lock.lock();
+    }
     try {
-      ReferenceCountedDB db = (ReferenceCountedDB) this.get(containerDBPath);
-
+      db = (ReferenceCountedDB) this.get(containerDBPath);
       if (db == null) {
         MetadataStore metadataStore =
             MetadataStoreBuilder.newBuilder()
-            .setDbFile(new File(containerDBPath))
-            .setCreateIfMissing(false)
-            .setConf(conf)
-            .setDBType(containerDBType)
-            .build();
+                .setDbFile(new File(containerDBPath))
+                .setCreateIfMissing(false)
+                .setConf(conf)
+                .setDBType(containerDBType)
+                .build();
         db = new ReferenceCountedDB(metadataStore, containerDBPath);
         this.put(containerDBPath, db);

Review comment:
       If I understand correctly, you have created a boolean to either lock or not lock in this method. In the DN initialization logic, you pass false here to avoid locking, as the same container will not be initialized by multiple threads at once.
   
   I am still not sure this is safe, specifically the line:
   
   ```
   this.put(containerDBPath, db);
   ```
   
   ContainerCache is an instance of LRUCache and the JavaDoc says:
   
   > Note that LRUMap is not synchronized and is not thread-safe.</strong>
   > If you wish to use this map from multiple threads concurrently, you must use
   > appropriate synchronization.
   
   Therefore I think you must lock around access and writes to the LRU Cache.
   
   I also worry about removing locking completely incase someone passes "false" mistakenly to this method in the future.
   
   @adoroszlai Earlier suggested using a lock per container path. We might be able to use a Guava Striped lock here with something like 256 buckets.
   
   Could we do something like:
   
   ```
     public ReferenceCountedDB getDB(long containerID, String containerDBType,
                                String containerDBPath, ConfigurationSource conf)
         throws IOException {
       Preconditions.checkState(containerID >= 0,
           "Container ID cannot be negative.");
       ReferenceCountedDB db = null;
       Lock slock = stripedLock.get(containerDBPath);
       slock.lock();
       try {
         lock.lock();
         try {
           db = (ReferenceCountedDB) this.get(containerDBPath);
         } finally {
           lock.unlock();
         }
         if (db == null) {
           MetadataStore metadataStore =
               MetadataStoreBuilder.newBuilder()
                   .setDbFile(new File(containerDBPath))
                   .setCreateIfMissing(false)
                   .setConf(conf)
                   .setDBType(containerDBType)
                   .build();
           db = new ReferenceCountedDB(metadataStore, containerDBPath);
           addDB(containerDBPath, db);
         }
         db.incrementReference();
         return db;
       } catch (Exception e) {
         LOG.error("Error opening DB. Container:{} ContainerPath:{}",
             containerID, containerDBPath, e);
         throw e;
       } finally {
         slock.unlock();
       }
     }
   ```
   
   We still have the global lock to protect the ContainerCache structure, but we now have a stripped lock to ensure that the same ContainerPath cannot be initialized at the same time.
   
   I am not sure if we need to worry about another thread calling addDB - if that is the case, we would need to add back the logic you had before, where you take the global lock again, and then test to see if an entry for the DB has been added to the cache, and if so close the new instance.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org