You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/09 19:32:08 UTC

[GitHub] [druid] a2l007 opened a new pull request #11899: Reduce list operation calls when pulling segments from S3

a2l007 opened a new pull request #11899:
URL: https://github.com/apache/druid/pull/11899


   While fetching segments from S3, it presently creates an object summary(LIST operation) for the segment before proceeding to GET the object and so the number of LIST ops are proportional to the number of segments. Since LIST ops are more expensive compared to GET, it is desirable to reduce the number of list ops especially if the LIST limit Is much smaller than for GETs.
   
   This PR lazily creates the object summary  since it isn't really required for pulling segments since the bucket and prefix can be retrieved from the URI and the check to validate if the object is present in the bucket is already done before attempting to pull the segment. This reduces the list operations down to zero while pulling segments.
   
   <hr>
   
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] a2l007 commented on pull request #11899: Reduce list operation calls when pulling segments from S3

Posted by GitBox <gi...@apache.org>.
a2l007 commented on pull request #11899:
URL: https://github.com/apache/druid/pull/11899#issuecomment-965951136


   Thanks for the review @FrankChen021 @kfaraz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11899: Reduce list operation calls when pulling segments from S3

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11899:
URL: https://github.com/apache/druid/pull/11899#discussion_r746512782



##########
File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentPuller.java
##########
@@ -231,6 +229,8 @@ public Writer openWriter()
       @Override
       public long getLastModified()
       {
+        final S3ObjectSummary objectSummary =

Review comment:
       You could also check if `s3Object` is already initialized and just use `s3Object.getObjectMetadata().getLastModified()`.
   If not initialized, then you could go for the LIST (or GET if it is cheaper).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] a2l007 merged pull request #11899: Reduce list operation calls when pulling segments from S3

Posted by GitBox <gi...@apache.org>.
a2l007 merged pull request #11899:
URL: https://github.com/apache/druid/pull/11899


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11899: Reduce list operation calls when pulling segments from S3

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11899:
URL: https://github.com/apache/druid/pull/11899#discussion_r746505107



##########
File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentPuller.java
##########
@@ -231,6 +229,8 @@ public Writer openWriter()
       @Override
       public long getLastModified()
       {
+        final S3ObjectSummary objectSummary =

Review comment:
       Making the LIST call here every time `getLastModified` is called seems costly.
   Maybe initialize it lazily but only once the way `s3Object` is being initialized in `openInputStream`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org