You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/03/31 04:52:30 UTC

[GitHub] [hadoop] Hexiaoqiao commented on a change in pull request #2838: HDFS-15937. Reduce memory used during datanode layout upgrade

Hexiaoqiao commented on a change in pull request #2838:
URL: https://github.com/apache/hadoop/pull/2838#discussion_r604588150



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
##########
@@ -1071,12 +1071,26 @@ private static void linkAllBlocks(File fromDir, File fromBbwDir, File toDir,
   }
 
   private static class LinkArgs {
-    File src;
-    File dst;
+    private File srcDir;
+    private File dstDir;
+    private String blockFile;
+
+    LinkArgs(File srcDir, File dstDir, String blockFile) {
+      this.srcDir = srcDir;
+      this.dstDir = dstDir;
+      this.blockFile = blockFile;
+    }
+
+    public File src() {

Review comment:
       `getSrc` shoud be more graceful? The same as method `dst(...)` + `blockFile()`.

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
##########
@@ -1161,10 +1176,12 @@ public Void call() throws IOException {
        */
       @Override
       public int compare(LinkArgs a, LinkArgs b) {
+        File asrc = a.src();
+        File bsrc = b.src();
         return ComparisonChain.start().
-            compare(a.src.getName(), b.src.getName()).
-            compare(a.src, b.src).
-            compare(a.dst, b.dst).
+            compare(asrc.getName(), bsrc.getName()).
+            compare(asrc, bsrc).

Review comment:
       is it duplicated comparator between `compare(asrc.getName(), bsrc.getName())` and `compare(asrc, bsrc)`?
   For UnixFileSystem, the implement is as following. Not sure if it is same for other FileSystem instance.
   >      public int compare(File f1, File f2) {
   >         return f1.getPath().compareTo(f2.getPath());
   >     }

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
##########
@@ -1345,8 +1363,18 @@ public boolean accept(File dir, String name) {
               throw new IOException("Failed to mkdirs " + blockLocation);
             }
           }
-          idBasedLayoutSingleLinks.add(new LinkArgs(new File(from, blockName),
-              new File(blockLocation, blockName)));
+          /**
+           * The destination path is 32x32, so 1024 distinct paths. Therefore
+           * we cache the destination path and reuse the same File object on
+           * potentially thousands of blocks located on this volume.
+           * This method is called recursively so the cache is passed through
+           * each recursive call. There is one cache per volume, and it is only
+           * accessed by a single thread so no locking is needed.
+           */
+          File cachedDest = pathCache
+              .computeIfAbsent(blockLocation, k -> blockLocation);
+          idBasedLayoutSingleLinks.add(new LinkArgs(from,

Review comment:
       Here it reuses prefix path of blocks and no `File` instances created to reduce memory used, right? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org