You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/03/15 16:34:12 UTC

[GitHub] [pulsar] dave2wave opened a new pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

dave2wave opened a new pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694


   
   Fixes #14693 
   
   ### Motivation
   
   Offloaded S3 Blob Objects should be removed from the S3 Bucket when the retention period has expired or the topic is deleted.
   
   ### Modifications
   
   Modified `BlobStoreManagedLedgerOffloader.deleteOffloaded` to use `BlobStore.removeBlob` instead of `BlobStore.removeBlobs` which no longer works as epxected.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change is a small rework without any test coverage.
   CI currently cannot test this long running scenario.
   The fix was tested using OMB and `./pulsar-admin topics delete`
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer privilege).
   
   Need to update docs? 
   
   - [X] `no-need-doc` 
     
     This change simply fixes a bug.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#issuecomment-1081186940


   @eolivelli - based on reading through the JClouds code, `removeBlob` will also fail silently if the object does not exist.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on a change in pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on a change in pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#discussion_r828280637



##########
File path: tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreManagedLedgerOffloader.java
##########
@@ -555,13 +555,18 @@ private BlobStoreLocation getBlobStoreLocation(Map<String, String> offloadDriver
         BlobStoreLocation bsKey = getBlobStoreLocation(offloadDriverMetadata);
         String readBucket = bsKey.getBucket(offloadDriverMetadata);
         BlobStore readBlobstore = blobStores.get(config.getBlobStoreLocation());
+        String dataContent = DataBlockUtils.dataBlockOffloadKey(ledgerId, uid);
+        String indexContent = DataBlockUtils.indexBlockOffloadKey(ledgerId, uid);
 
         CompletableFuture<Void> promise = new CompletableFuture<>();
         scheduler.chooseThread(ledgerId).submit(() -> {
             try {
-                readBlobstore.removeBlobs(readBucket,
-                    ImmutableList.of(DataBlockUtils.dataBlockOffloadKey(ledgerId, uid),
-                                     DataBlockUtils.indexBlockOffloadKey(ledgerId, uid)));

Review comment:
       I think it is worth understanding the problem a little more before we change the offloading code to use two network calls where one should be sufficient.
   
   > If I am speculating then I did have to explicitly set the S3 URL because I am in the `us-west-2` region instead of the default for S3 region of `us-east-1`. I wonder if the POST method has a bad header.
   
   In your initial test where the `removeBlobs` method failed, did you supply the `s3ManagedLedgerOffloadServiceEndpoint` config? I think this endpoint might actually be required when using any region outside of `us-east-1`. The jclouds default is `https://s3.amazonaws.com` and the Amazon documentation says that endpoint is for `us-east-1` https://docs.aws.amazon.com/general/latest/gr/s3.html.
   
   The Pulsar documentation says that the `s3ManagedLedgerOffloadServiceEndpoint` config is optional, and indeed it should be because you can build the S3 endpoint from the region, but in looking at the offloader code, I think it might currently be necessary to supply the `s3ManagedLedgerOffloadServiceEndpoint` when not in `us-east-1`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dave2wave commented on a change in pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
dave2wave commented on a change in pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#discussion_r828284045



##########
File path: tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreManagedLedgerOffloader.java
##########
@@ -555,13 +555,18 @@ private BlobStoreLocation getBlobStoreLocation(Map<String, String> offloadDriver
         BlobStoreLocation bsKey = getBlobStoreLocation(offloadDriverMetadata);
         String readBucket = bsKey.getBucket(offloadDriverMetadata);
         BlobStore readBlobstore = blobStores.get(config.getBlobStoreLocation());
+        String dataContent = DataBlockUtils.dataBlockOffloadKey(ledgerId, uid);
+        String indexContent = DataBlockUtils.indexBlockOffloadKey(ledgerId, uid);
 
         CompletableFuture<Void> promise = new CompletableFuture<>();
         scheduler.chooseThread(ledgerId).submit(() -> {
             try {
-                readBlobstore.removeBlobs(readBucket,
-                    ImmutableList.of(DataBlockUtils.dataBlockOffloadKey(ledgerId, uid),
-                                     DataBlockUtils.indexBlockOffloadKey(ledgerId, uid)));

Review comment:
       Yes, I had to supply `S3ManagedLedgerOffloadServiceEndpoint`. Without that offloading failed with an auth error of region mismatch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#issuecomment-1081072120


   I think that this is needed anyway.
   Because if removeBlobs fails silently without returning a error then nobody will know and Pulsar will proceed and consider the object to be deleted
   
   @codelipenghui @merlimat WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dave2wave commented on a change in pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
dave2wave commented on a change in pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#discussion_r828115697



##########
File path: tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreManagedLedgerOffloader.java
##########
@@ -555,13 +555,18 @@ private BlobStoreLocation getBlobStoreLocation(Map<String, String> offloadDriver
         BlobStoreLocation bsKey = getBlobStoreLocation(offloadDriverMetadata);
         String readBucket = bsKey.getBucket(offloadDriverMetadata);
         BlobStore readBlobstore = blobStores.get(config.getBlobStoreLocation());
+        String dataContent = DataBlockUtils.dataBlockOffloadKey(ledgerId, uid);
+        String indexContent = DataBlockUtils.indexBlockOffloadKey(ledgerId, uid);
 
         CompletableFuture<Void> promise = new CompletableFuture<>();
         scheduler.chooseThread(ledgerId).submit(() -> {
             try {
-                readBlobstore.removeBlobs(readBucket,
-                    ImmutableList.of(DataBlockUtils.dataBlockOffloadKey(ledgerId, uid),
-                                     DataBlockUtils.indexBlockOffloadKey(ledgerId, uid)));

Review comment:
       Exactly, DeleteObjects fails without throwing an exception. Rather than speculate why I chose to try to see if two consecutive DeleteObject calls would work and they did.
   
   If I am speculating then I did have to explicitly set the S3 URL because I am in the `us-west-2` region instead of the default for S3 region of `us-east-1`. I wonder if the POST method has a bad header.
   
   I gather that this once worked and I looked at the JCloud code history.... but don't see anything of concern.
   
   While this fix may not be the "best" I think it is important to share since leaked S3 objects cost money and are hard to clean up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #14694: [Issue 14693][Offloader] JCloud Offloader Now Removes S3 Objects

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #14694:
URL: https://github.com/apache/pulsar/pull/14694#discussion_r827550555



##########
File path: tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreManagedLedgerOffloader.java
##########
@@ -555,13 +555,18 @@ private BlobStoreLocation getBlobStoreLocation(Map<String, String> offloadDriver
         BlobStoreLocation bsKey = getBlobStoreLocation(offloadDriverMetadata);
         String readBucket = bsKey.getBucket(offloadDriverMetadata);
         BlobStore readBlobstore = blobStores.get(config.getBlobStoreLocation());
+        String dataContent = DataBlockUtils.dataBlockOffloadKey(ledgerId, uid);
+        String indexContent = DataBlockUtils.indexBlockOffloadKey(ledgerId, uid);
 
         CompletableFuture<Void> promise = new CompletableFuture<>();
         scheduler.chooseThread(ledgerId).submit(() -> {
             try {
-                readBlobstore.removeBlobs(readBucket,
-                    ImmutableList.of(DataBlockUtils.dataBlockOffloadKey(ledgerId, uid),
-                                     DataBlockUtils.indexBlockOffloadKey(ledgerId, uid)));

Review comment:
       Hi @dave2wave, thanks for the patch to fix the issue.
   
   Do you mean here will not throw exceptions but the blob will not be deleted?
   
   I have tried to check more details about the DeleteBlob API and DeleteBlobs API
   
   From the Jcloud S3Client:
   ```java
      /**
       * Removes the object and metadata associated with the key.
       * <p/>
       * The DELETE request operation removes the specified object from Amazon S3. Once deleted, there
       * is no method to restore or undelete an object.
       * 
       * 
       * @param bucketName
       *           namespace of the object you are deleting
       * @param key
       *           unique key in the s3Bucket identifying the object
       * @throws org.jclouds.http.HttpResponseException
       *            if the bucket is not available
       */
      @Named("DeleteObject")
      @DELETE
      @Path("/{key}")
      @Fallback(VoidOnNotFoundOr404.class)
      void deleteObject(@Bucket @EndpointParam(parser = AssignCorrectHostnameForBucket.class) @BinderParam(
            BindAsHostPrefixIfConfigured.class) @ParamValidators(BucketNameValidator.class) String bucketName,
            @PathParam("key") String key);
   
      /**
       * The Multi-Object Delete operation enables you to delete multiple objects from a bucket using a
       * single HTTP request. If you know the object keys that you want to delete, then this operation
       * provides a suitable alternative to sending individual delete requests (see DELETE Object),
       * reducing per-request overhead.
       *
       * The Multi-Object Delete request contains a set of up to 1000 keys that you want to delete.
       *
       * If a key does not exist is considered to be deleted.
       *
       * The Multi-Object Delete operation supports two modes for the response; verbose and quiet.
       * By default, the operation uses verbose mode in which the response includes the result of
       * deletion of each key in your request.
       *
       * @param bucketName
       *           namespace of the objects you are deleting
       * @param keys
       *           set of unique keys identifying objects
       */
      @Named("DeleteObject")
      @POST
      @Path("/")
      @QueryParams(keys = "delete")
      @XMLResponseParser(DeleteResultHandler.class)
      DeleteResult deleteObjects(@Bucket @EndpointParam(parser = AssignCorrectHostnameForBucket.class) @BinderParam(
            BindAsHostPrefixIfConfigured.class) @ParamValidators(BucketNameValidator.class) String bucketName,
            @BinderParam(BindIterableAsPayloadToDeleteRequest.class) Iterable<String> keys);
   ```
   
   From the S3 REST API reference:
   
   https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
   
   Looks like S3 still supports DeleteObjects, but not sure why it can't work in Jcloud.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org