You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@jclouds.apache.org by Andrew Gaul <no...@github.com> on 2015/06/02 06:41:10 UTC

[jclouds] JCLOUDS-894: Add portable multipart upload (#762)

This unifies the provider multipart upload code paths and removes code
duplication.
You can view, comment on, or merge this pull request online at:

  https://github.com/jclouds/jclouds/pull/762

-- Commit Summary --

  * JCLOUDS-894: Add portable multipart upload
  * JCLOUDS-894: Add portable multipart upload for S3
  * JCLOUDS-894: Add portable multipart upload for Azure
  * JCLOUDS-894: Add portable multipart upload for Swift
  * JCLOUDS-894: Odds and ends

-- File Changes --

    M apis/atmos/src/main/java/org/jclouds/atmos/blobstore/AtmosBlobStore.java (7)
    M apis/cloudfiles/src/main/java/org/jclouds/cloudfiles/blobstore/CloudFilesBlobStore.java (5)
    M apis/openstack-swift/src/main/java/org/jclouds/openstack/swift/v1/blobstore/RegionScopedSwiftBlobStore.java (25)
    M apis/openstack-swift/src/test/java/org/jclouds/openstack/swift/v1/blobstore/integration/SwiftBlobIntegrationLiveTest.java (5)
    M apis/s3/src/main/java/org/jclouds/s3/blobstore/S3BlobStore.java (14)
    M apis/s3/src/main/java/org/jclouds/s3/blobstore/config/S3BlobStoreContextModule.java (6)
    D apis/s3/src/main/java/org/jclouds/s3/blobstore/strategy/AsyncMultipartUploadStrategy.java (31)
    D apis/s3/src/main/java/org/jclouds/s3/blobstore/strategy/MultipartUploadStrategy.java (28)
    D apis/s3/src/main/java/org/jclouds/s3/blobstore/strategy/internal/ParallelMultipartUploadStrategy.java (292)
    D apis/s3/src/main/java/org/jclouds/s3/blobstore/strategy/internal/SequentialMultipartUploadStrategy.java (123)
    D apis/s3/src/test/java/org/jclouds/s3/blobstore/strategy/internal/SequentialMultipartUploadStrategyMockTest.java (147)
    M apis/swift/src/main/java/org/jclouds/openstack/swift/blobstore/SwiftBlobStore.java (5)
    M blobstore/src/main/java/org/jclouds/blobstore/internal/BaseBlobStore.java (28)
    M providers/aws-s3/src/main/java/org/jclouds/aws/s3/blobstore/AWSS3BlobStore.java (16)
    M providers/azureblob/src/main/java/org/jclouds/azureblob/blobstore/AzureBlobStore.java (13)
    D providers/azureblob/src/main/java/org/jclouds/azureblob/blobstore/strategy/AzureBlobBlockUploadStrategy.java (82)
    D providers/azureblob/src/main/java/org/jclouds/azureblob/blobstore/strategy/MultipartUploadStrategy.java (34)
    M providers/azureblob/src/test/java/org/jclouds/azureblob/blobstore/integration/AzureBlobIntegrationLiveTest.java (31)
    D providers/azureblob/src/test/java/org/jclouds/azureblob/blobstore/strategy/AzureBlobBlockUploadStrategyTest.java (123)
    M providers/hpcloud-objectstorage/src/main/java/org/jclouds/hpcloud/objectstorage/blobstore/HPCloudObjectStorageBlobStore.java (5)

-- Patch Links --

https://github.com/jclouds/jclouds/pull/762.patch
https://github.com/jclouds/jclouds/pull/762.diff

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762

Re: [jclouds] JCLOUDS-894: Add portable multipart upload (#762)

Posted by Timur Alperovich <no...@github.com>.
AWS Java S3 SDK does the following:

    public static long calculateOptimalPartSize(PutObjectRequest putObjectRequest, TransferManagerConfiguration configuration) {
        double contentLength = TransferManagerUtils.getContentLength(putObjectRequest);
        double optimalPartSize = (double)contentLength / (double)MAXIMUM_UPLOAD_PARTS;
         // round up so we don't push the upload over the maximum number of parts
        optimalPartSize = Math.ceil(optimalPartSize);
        return (long)Math.max(optimalPartSize, configuration.getMinimumUploadPartSize());
    }

AWS SDK defaults to a maximum of 10000 parts. The minimum default part size is 5MB. So, uploading a 51GB file, for example, would use 8500 6MB parts.

Jclouds could use a similar mechanism. It would probably make sense to expose the configuration parameters to be able to change the default behavior.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762#issuecomment-108633453

Re: [jclouds] JCLOUDS-894: Add portable multipart upload (#762)

Posted by Andrew Gaul <no...@github.com>.
> @@ -275,4 +285,20 @@ public String copyBlob(String fromContainer, String fromName, String toContainer
>           Closeables2.closeQuietly(is);
>        }
>     }
> +
> +   // TODO: parallel uploads
> +   @Beta
> +   protected String putMultipartBlob(String container, Blob blob, PutOptions overrides) {
> +      MultipartUpload mpu = initiateMultipartUpload(container, blob.getMetadata());
> +      List<MultipartPart> parts = Lists.newArrayList();
> +      long contentLength = blob.getMetadata().getContentMetadata().getContentLength();
> +      long partSize = getMaximumMultipartPartSize();  // TODO: optimal?

We need a better strategy here -- we should pick a combination of minimum part size, maximum part size, and number of parts.  A good combination will do less work when encountering network errors and allow better use of the uplink via parallel uploads.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762/files#r31492595

Re: [jclouds] JCLOUDS-894: Add portable multipart upload (#762)

Posted by Andrew Gaul <no...@github.com>.
@danbroudy @kahing @zack-shoylev This pull request follows on to the earlier one exposing the component multipart operations.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762#issuecomment-107800525

Re: [jclouds] JCLOUDS-894: Add portable multipart upload (#762)

Posted by Timur Alperovich <no...@github.com>.
> @@ -275,4 +285,20 @@ public String copyBlob(String fromContainer, String fromName, String toContainer
>           Closeables2.closeQuietly(is);
>        }
>     }
> +
> +   // TODO: parallel uploads
> +   @Beta
> +   protected String putMultipartBlob(String container, Blob blob, PutOptions overrides) {
> +      MultipartUpload mpu = initiateMultipartUpload(container, blob.getMetadata());
> +      List<MultipartPart> parts = Lists.newArrayList();
> +      long contentLength = blob.getMetadata().getContentMetadata().getContentLength();
> +      long partSize = getMaximumMultipartPartSize();  // TODO: optimal?

AWS Java S3 SDK does the following:

    public static long calculateOptimalPartSize(PutObjectRequest putObjectRequest, TransferManagerConfiguration configuration) {
        double contentLength = TransferManagerUtils.getContentLength(putObjectRequest);
        double optimalPartSize = (double)contentLength / (double)MAXIMUM_UPLOAD_PARTS;
         // round up so we don't push the upload over the maximum number of parts
        optimalPartSize = Math.ceil(optimalPartSize);
        return (long)Math.max(optimalPartSize, configuration.getMinimumUploadPartSize());
    }

AWS SDK defaults to a maximum of 10000 parts. The minimum default part size is 5MB. So, uploading a 51GB file, for example, would use 8500 6MB parts.

Jclouds could use a similar mechanism. It would probably make sense to expose the configuration parameters to be able to change the default behavior.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762/files#r31677839

Re: [jclouds] JCLOUDS-894: Add portable multipart upload (#762)

Posted by Andrew Gaul <no...@github.com>.
> @@ -275,4 +285,20 @@ public String copyBlob(String fromContainer, String fromName, String toContainer
>           Closeables2.closeQuietly(is);
>        }
>     }
> +
> +   // TODO: parallel uploads
> +   @Beta
> +   protected String putMultipartBlob(String container, Blob blob, PutOptions overrides) {
> +      MultipartUpload mpu = initiateMultipartUpload(container, blob.getMetadata());
> +      List<MultipartPart> parts = Lists.newArrayList();
> +      long contentLength = blob.getMetadata().getContentMetadata().getContentLength();
> +      long partSize = getMaximumMultipartPartSize();  // TODO: optimal?

I reparented the S3 `MultipartUploadSlicingAlgorithm` to core so we have the same algorithm as before.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/762/files#r31872080