You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/25 01:31:41 UTC

[GitHub] [ozone] jojochuang opened a new pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

jojochuang opened a new pull request #2367:
URL: https://github.com/apache/ozone/pull/2367


   ## What changes were proposed in this pull request?
   Reduce the CPU/heap allocation overhead of listStatus
   
   ## What is the link to the Apache JIRA
   HDDS-5384
   
   ## How was this patch tested?
   Please find the flamegraph attached in the corresponding JIRA for the cpu and heap allocation before and after the change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r658487759



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       no... it's just a matter of code style. Having four layers of for loop looks bad, but for layers of forEach isn't any better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r658482947



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       I see this is just using forEach instead of for, in addition to that with method usage.
   Just curious, any reason to change this way?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#issuecomment-868273941


   The test failure seems unrelated. TestWatchForCommit fails occasionally even without this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r668416760



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       ok let's do that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r658414430



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyLocationInfoGroup.java
##########
@@ -89,17 +90,27 @@ public long getVersion() {
     return version;
   }
 
+  /**
+   * Use this expensive method only when absolutely needed!
+   * It creates a new list so it is not an O(1) operation.
+   * Use getLocationLists() instead.
+   * @return a list of OmKeyLocationInfo
+   */
   public List<OmKeyLocationInfo> getLocationList() {
     return locationVersionMap.values().stream().flatMap(List::stream)
         .collect(Collectors.toList());
   }
 
+  public Collection<List<OmKeyLocationInfo>> getLocationLists() {
+    return locationVersionMap.values();
+  }
+
   public long getLocationListCount() {
     return locationVersionMap.values().stream().mapToLong(List::size).sum();
   }
 
-  public List<OmKeyLocationInfo> getLocationList(Long versionToFetch) {
-    return new ArrayList<>(locationVersionMap.get(versionToFetch));
+  public Collection<OmKeyLocationInfo> getLocationList(Long versionToFetch) {

Review comment:
       this is a minor issue but allocating an ArrayList is an O(n) operation. I don't see this one coming up in flamegraph (which means it's probably minimal) but it's a good practice to reduce it to O(1)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r659252953



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       another thing i would also like to try is to use parallelStream() to parallelize this part. Need benchmark 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai merged pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
adoroszlai merged pull request #2367:
URL: https://github.com/apache/ozone/pull/2367


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r668052353



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       I don't expect that streams are worth it, even in parallel.  Benchmark is welcome in any case.  However, can we separate it from this fix, so that we can get this one in sooner rather than later?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#issuecomment-868276313


   > The test failure seems unrelated. TestWatchForCommit fails occasionally even without this change.
   
   Seems to be some check style failures can you fix them. As until basic steps are clean, we cannot get complete CI run, CI will be aborted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r658609903



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyLocationInfoGroup.java
##########
@@ -89,17 +90,27 @@ public long getVersion() {
     return version;
   }
 
+  /**
+   * Use this expensive method only when absolutely needed!
+   * It creates a new list so it is not an O(1) operation.
+   * Use getLocationLists() instead.
+   * @return a list of OmKeyLocationInfo
+   */
   public List<OmKeyLocationInfo> getLocationList() {
     return locationVersionMap.values().stream().flatMap(List::stream)
         .collect(Collectors.toList());
   }
 
+  public Collection<List<OmKeyLocationInfo>> getLocationLists() {
+    return locationVersionMap.values();
+  }
+
   public long getLocationListCount() {
     return locationVersionMap.values().stream().mapToLong(List::size).sum();
   }
 
-  public List<OmKeyLocationInfo> getLocationList(Long versionToFetch) {
-    return new ArrayList<>(locationVersionMap.get(versionToFetch));
+  public Collection<OmKeyLocationInfo> getLocationList(Long versionToFetch) {

Review comment:
       According to call hierarchy this is only used in write path, once per key (from `BlockOutputStreamEntryPool#addPreallocateBlocks`).  Based on that, I think we could omit this change now.
   
   Return type does not need to be changed anyway.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(

Review comment:
       Variable name looks like class name.
   
   ```suggestion
                   locationList -> locationList.forEach(
   ```
   
   (Same in the other loop below.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r658523610



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -745,33 +745,34 @@ protected void refreshPipeline(List<OmKeyInfo> keyList) throws IOException {
     }
 
     Set<Long> containerIDs = new HashSet<>();
-    for (OmKeyInfo keyInfo : keyList) {
-      List<OmKeyLocationInfoGroup> locationInfoGroups =
-          keyInfo.getKeyLocationVersions();
-
-      for (OmKeyLocationInfoGroup key : locationInfoGroups) {
-        for (OmKeyLocationInfo k : key.getLocationList()) {
-          containerIDs.add(k.getContainerID());
-        }
-      }
-    }
+    keyList.forEach(
+        keyInfo -> keyInfo.getKeyLocationVersions().forEach(
+            key -> key.getLocationLists().forEach(
+                OmKeyLocationInfoList -> OmKeyLocationInfoList.forEach(
+                    k -> containerIDs.add(k.getContainerID())

Review comment:
       Yes. But if you feel that this way looks good, i am okay.
   In general, if it is just a code refactor, it would be nice to avoid, so that we can know git history with annotation which has added this change. Now after this change is merged we get the last commit in git annotation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2367: HDDS-5384 OM refreshPipeline should not invoke the expensive OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2367:
URL: https://github.com/apache/ozone/pull/2367#discussion_r668671565



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyLocationInfoGroup.java
##########
@@ -89,17 +90,27 @@ public long getVersion() {
     return version;
   }
 
+  /**
+   * Use this expensive method only when absolutely needed!
+   * It creates a new list so it is not an O(1) operation.
+   * Use getLocationLists() instead.
+   * @return a list of OmKeyLocationInfo
+   */
   public List<OmKeyLocationInfo> getLocationList() {
     return locationVersionMap.values().stream().flatMap(List::stream)
         .collect(Collectors.toList());
   }
 
+  public Collection<List<OmKeyLocationInfo>> getLocationLists() {
+    return locationVersionMap.values();
+  }
+
   public long getLocationListCount() {
     return locationVersionMap.values().stream().mapToLong(List::size).sum();
   }
 
-  public List<OmKeyLocationInfo> getLocationList(Long versionToFetch) {
-    return new ArrayList<>(locationVersionMap.get(versionToFetch));
+  public Collection<OmKeyLocationInfo> getLocationList(Long versionToFetch) {

Review comment:
       ok omit this for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org