You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/24 22:42:40 UTC

[GitHub] [ozone] jwminton opened a new pull request #2572: HDDS-5435 Reduce usage of OmKeyLocationInfoGroup.getLocationList()

jwminton opened a new pull request #2572:
URL: https://github.com/apache/ozone/pull/2572


   ## What changes were proposed in this pull request?
   This could be a perplexing set of changes without the explanation:
   
   getLocationList() is costly because the following java8 Stream<> sequence is run every time:
      values()                      // Collection of info lists
      .stream()                     // creates the stream<> of lists
      .flatMap(List::stream)        // flattens incoming lists into a single flat info stream
      .collect(Collectors.toList()  // terminate the stream processing with a java List of values
   
   Many callers of this code then proceed to create a stream() on the same List of infos again for more java8 stream<> processing. The exercise here is to see where this can be unwound and made more efficient with the help of getLocationLists() which is only the values() starting step itself thus avoiding an unneeded collect/stream steps in overall processing.
   
   A good first example substitution to look at is KeyInputStream.java with the following original bad sequence:
       getLocationList()
       .stream()
       .filter(l -> l.getBlockID().equals(blockID))
       .collect(Collectors.toList());
   
   Substituting in getLocationList() calls, logically this unwinds to:
       values()
       .stream()
       .flatMap(List::stream)
       .collect(Collectors.toList()
       .stream()
       .filter(l -> l.getBlockID().equals(blockID))
       .collect(Collectors.toList());
   
   The collect/stream sequence is wasted execution so it's removed, leaving this:
       values()
       .stream()
       .flatMap(List::stream)
       .filter(l -> l.getBlockID().equals(blockID))
       .collect(Collectors.toList());
   
   Which when you look back at the original code, the basic code substitution pattern turns out to be:
     * change getLocationList() to getLocationLists()
     * insert or use an exising stream() step
     * after stream, insert a flatMap(List::stream) step
     * continue stream<> processing
   Making the final sequence for KeyInputStream be:
       getLocationLists()
       .stream()
       .flatMap(List::stream)
       .filter(l -> l.getBlockID().equals(blockID))
       .collect(Collectors.toList());
   
   The next pattern is a forEach based substitution where the important part to realize is that post-changes, its a terminating Stream#forEach() and not a Collection#forEach() that executes.
   
   Some places were not changed. In particular org.apache.hadoop.ozone.recon.tasks.ContainerKeyMapperTask#writeOMKeyToContainerDB was not touched because of the checked exceptions which are counter to Java8 Stream usage.
   
   Neither did I touch the test classes.
   
   Finally, changes to OMKeyRequest.java are best understood after having gone through all of the other substitutions and looking at it from the persepctive of the stream steps invovled. It's not stream based, but logically if it were, its doing the stream equivalant to this:
       values()
       .stream()
       .flatMap()
       .collect()
       .stream()
       .forEach( { bytesUsed += locationInfo.getLength() * keyFactor; } )
   
   Where forEach and bytesUsed is really being used to sum up all of the locationInfo lengths * keyfactor. So, delete the collect/stream pair as it would be wasted execution, replace the forEach with a better suited mapToLong & sum up the list of longs and then multiply by the scaling factor once at the end:
       getLocationLists()
       .stream()
       .flatMap(List::stream)
       .mapToLong(OmKeyLocationInfo::getLength)
       .sum() * keyFactor;
   
   Having gone through this exercise, the following as a convenience call would hide two of the stream steps when the info list is being used as a stream source:
     public java.util.stream.Stream<OmKeyLocationInfo> getLocationStream() {
       return locationVersionMap.values().stream().flatMap(List::stream);
     }
   
   I'm not sure if the overhead of adding new api and a new exported class dependency are worth it, but all of these calls look a lot better with this as the sequence starting api instead. However, given that getLocationLists() already exists, it's probably not be worth it.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5435
   
   ## How was this patch tested?
   
   Manual mvn test runs, CI tests. IDE tests I waited for the CI tests to stabilize since these changes are in close proximity to some of the failures that were being seen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jwminton closed pull request #2572: HDDS-5435 Reduce usage of OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
jwminton closed pull request #2572:
URL: https://github.com/apache/ozone/pull/2572


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2572: HDDS-5435 Reduce usage of OmKeyLocationInfoGroup.getLocationList()

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2572:
URL: https://github.com/apache/ozone/pull/2572#discussion_r708059557



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/s3/multipart/S3MultipartUploadCompleteRequest.java
##########
@@ -472,8 +472,10 @@ private long getMultipartDataSize(String requestedVolume,
           .getKeyLocationVersions().get(0);
 
       // Set partNumber in each block.
-      currentKeyInfoGroup.getLocationList().forEach(
-          omKeyLocationInfo -> omKeyLocationInfo.setPartNumber(partNumber));
+      currentKeyInfoGroup.getLocationLists().stream()
+          .flatMap(List::stream)
+          .forEach(omKeyLocationInfo ->
+              omKeyLocationInfo.setPartNumber(partNumber));
 
       partLocationInfos.addAll(currentKeyInfoGroup.getLocationList());

Review comment:
       There is another call to `getLocationList` here.  If we keep it, might as well store the list and reuse it for setting part number, too.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyInputStream.java
##########
@@ -167,8 +167,9 @@ private synchronized void initialize(OmKeyInfo keyInfo,
             BlockID blockID = keyLocationInfo.getBlockID();
             List<OmKeyLocationInfo> collect =
                 newKeyInfo.getLatestVersionLocations()
-                .getLocationList()
+                .getLocationLists()
                 .stream()
+                .flatMap(List::stream)
                 .filter(l -> l.getBlockID().equals(blockID))
                 .collect(Collectors.toList());

Review comment:
       Since we only use the first matching location (`collect.get(0)`), this could be further optimized by using `findFirst()` instead of `collect()`:
   
   ```
               return newKeyInfo.getLatestVersionLocations()
                   .getLocationLists()
                   .stream()
                   .flatMap(List::stream)
                   .filter(l -> l.getBlockID().equals(blockID))
                   .findFirst()
                   .map(OmKeyLocationInfo::getPipeline)
                   .orElse(null);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org