You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/08/02 01:08:50 UTC

[GitHub] [ozone] ayushtkn commented on a diff in pull request #3554: HDDS-6951. Replace bucket.listKeys() with bucket.listStatus() in OmBucketReadWriteKeyOps

ayushtkn commented on code in PR #3554:
URL: https://github.com/apache/ozone/pull/3554#discussion_r935027796


##########
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OmBucketReadWriteKeyOps.java:
##########
@@ -128,12 +128,10 @@ protected String createPath(String path) {
   @Override
   protected int getReadCount(int readCount, String readPath)
       throws IOException {
-    Iterator<? extends OzoneKey> ozoneKeyIterator = bucket.listKeys(
-        OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX);
-    while (ozoneKeyIterator.hasNext()) {
-      ozoneKeyIterator.next();
-      ++readCount;
-    }
+    List<OzoneFileStatus> ozoneFileStatusList = bucket.listStatus(
+        OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX, true,
+        "/", keyCountForRead);

Review Comment:
   On a quick look I think here client is controlling the number of entries via ``keyCountForRead``.
   
   In the FileSystem code as well, it seems to be controlled at the client side:
   https://github.com/apache/ozone/blob/a32549cbc1227291d6ebd972ce32db043a5c51d0/hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java#L608
   
   In HDFS, AFAIK the Namenode does impose such limits and the client can't explicitly demand the number of entries:
   https://github.com/apache/hadoop/blob/123d1aa8846a2099b04fd8e8ed7f2bd6db4f36b5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java#L364-L366
   
   And the use case here seems to be just getting the ``readCount``, which actually doesn't require ``OzoneFileStatus``, it can easily be looped using the ``LISTING_PAGE_SIZE`` or similar numbers, If you go at a scale of around a ~1 Million entries or beyond you would be utilising a lot of memory storing the ``FileStatus`` which you actually don't need, If I remember correct it started taking around ~1.5 Gigs/million, when I tried similar stuff with HDFSFileStatus in DistCp which sadly builds up the FileStatus tree in Breadth First Traversal approach. It would be less for Ozone for sure as compared to HDFS....
   
   I couldn't spare time to dig in much, Please feel free to proceed if you are convinced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org