You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Min Chen <mi...@citrix.com> on 2013/06/03 20:35:58 UTC

[DISCUSS]Object_Store design: S3TemplateDownloader Implementation Issues

Hi there,

This thread is to address John's review comments on S3TemplateDownloader implementation. From previous thread, there are two major concerns for this class implementation.

1. We have used HttpClient library in this class. For this comment, I can explain why I need that HttpClient during downloading object to S3. Current our download logic is like this:

-- get object total size and InputStream from a http url by invoking HttpClient library method.
-- invoke S3Utils api to download an InputStream to S3, this is totally S3 api, and get actual object size downloaded on S3 once completion.
-- compare object total size and actual download size to check if they are equal to report any truncation error.

John's concern is on step 1 above. We can get ride of HttpClient library use to get InputStream from an URL, but I don't know how I can easily get the object size from a URL. In previous email, John you mentioned that I can use S3 api getObjectMetaData to get the object size, but my understanding is that that API only applies to the object already in S3. In my flow, I need to get the size of object that is to be downloaded to S3, but not in S3. Willing to hear your suggestion here.

2. John pointed out an issue with current download method implementation in this class, where I have used S3 low-level api PutObjectRequest to put an InputStream to S3, this has a bug that it cannot handle object > 5GB. That is true after reading several S3 documentation on MultiPart upload, sorry that I am not expert on S3 and thus didn't know that earlier when I implemented this method. To change that, it should not take too long to code based on this sample on AWS (http://docs.aws.amazon.com/AmazonS3/latest/dev/HLTrackProgressMPUJava.html) by using TransferManager, just need some testing time. IMHO, this bug should not become a major issue blocking object_store branch merge, just need several days to fix and address assuming that we have extension. Even without extension, I personally think that this definitely can be resolved in master with a simple bug fix.

Thanks
-min

Re: [DISCUSS]Object_Store design: S3TemplateDownloader Implementation Issues

Posted by Min Chen <mi...@citrix.com>.

Never mind, I have resolved this issue.

Thanks
-min

On 6/4/13 4:35 PM, "Min Chen" <mi...@citrix.com> wrote:

>John,
>
>I am trying to fix issue #2 mentioned here to handle multi-part upload
>using TransferManager, but ran into the following issue
>"2013-06-04 23:06:52,626 INFO [amazonaws.http.AmazonHttpClient]
>(s3-transfer-manager-worker-1:) Unable to execute HTTP request: peer not
>authenticated
>javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated"
>
>This is part of modified code that caused this exception:
>
>        InputStream in = !chunked ? new
>BufferedInputStream(request.getResponseBodyAsStream())
>
>                    : new
>ChunkedInputStream(request.getResponseBodyAsStream());
>
>
>            s_logger.info("Starting download from " + getDownloadUrl() +
>" to s3 bucket " + s3.getBucketName()
>
>                    + " remoteSize=" + remoteSize + " , max size=" +
>maxTemplateSizeInByte);
>
>
>            Date start = new Date();
>
>            // compute s3 key
>
>            s3Key = join(asList(installPath, fileName),
>S3Utils.SEPARATOR);
>
>
>            // multi-part upload using S3 api to handle > 5G input stream
>
>            AWSCredentials myCredentials = new
>BasicAWSCredentials(s3.getAccessKey(), s3.getSecretKey());
>
>            TransferManager tm = new TransferManager(myCredentials);
>
>
>            // download using S3 API
>
>            ObjectMetadata metadata = new ObjectMetadata();
>
>            metadata.setContentLength(remoteSize);
>
>            PutObjectRequest putObjectRequest = new
>PutObjectRequest(s3.getBucketName(), s3Key, in, metadata)
>
>                    .withStorageClass(StorageClass.ReducedRedundancy);
>
>            // register progress listenser
>
>            putObjectRequest.setProgressListener(new ProgressListener() {
>
>                @Override
>
>                public void progressChanged(ProgressEvent progressEvent) {
>
>                    // s_logger.debug(progressEvent.getBytesTransfered()
>
>                    // + " number of byte transferd "
>
>                    // + new Date());
>
>                    totalBytes += progressEvent.getBytesTransfered();
>
>                    if (progressEvent.getEventCode() ==
>ProgressEvent.COMPLETED_EVENT_CODE) {
>
>                        s_logger.info("download completed");
>
>                        status =
>TemplateDownloader.Status.DOWNLOAD_FINISHED;
>
>                    } else if (progressEvent.getEventCode() ==
>ProgressEvent.FAILED_EVENT_CODE) {
>
>                        status =
>TemplateDownloader.Status.UNRECOVERABLE_ERROR;
>
>                    } else if (progressEvent.getEventCode() ==
>ProgressEvent.CANCELED_EVENT_CODE) {
>
>                        status = TemplateDownloader.Status.ABORTED;
>
>                    } else {
>
>                        status = TemplateDownloader.Status.IN_PROGRESS;
>
>                    }
>
>                }
>
>
>            });
>
>            // TransferManager processes all transfers asynchronously,
>
>            // so this call will return immediately.
>
>            Upload upload = tm.upload(putObjectRequest);
>
>
>            upload.waitForCompletion();
>
>Can you point out what I am doing wrong here? Previous code of using
>low-level S3 putObject api does not have this issue.
>
>Thanks
>-min
>
>On 6/3/13 11:35 AM, "Min Chen"
><mi...@citrix.com>> wrote:
>
>Hi there,
>
>This thread is to address John's review comments on S3TemplateDownloader
>implementation. From previous thread, there are two major concerns for
>this class implementation.
>
>1. We have used HttpClient library in this class.  For this comment, I
>can explain why I need that HttpClient during downloading object to S3.
>Current our download logic is like this:
>
>-- get object total size and InputStream from a http url by invoking
>HttpClient library method.
>-- invoke S3Utils api to download an InputStream to S3, this is totally
>S3 api, and get actual object size downloaded on S3 once completion.
>-- compare object total size and actual download size to check if they
>are equal to report any truncation error.
>
>John's concern is on step 1 above. We can get ride of HttpClient library
>use to get InputStream from an URL, but I don't know how I can easily get
>the object size from a URL. In previous email, John you mentioned that I
>can use S3 api getObjectMetaData to get the object size, but my
>understanding is that that API only applies to the object already in S3.
>In my flow, I need to get the size of object that is to be downloaded to
>S3, but not in S3. Willing to hear your suggestion here.
>
>2. John pointed out an issue with current download method implementation
>in this class, where I have used S3 low-level api PutObjectRequest to put
>an InputStream to S3, this has a bug that it cannot handle object > 5GB.
>That is true after reading several S3 documentation on MultiPart upload,
>sorry that I am not expert on S3 and thus didn't know that earlier when I
>implemented this method. To change that, it should not take too long to
>code based on this sample on AWS
>(http://docs.aws.amazon.com/AmazonS3/latest/dev/HLTrackProgressMPUJava.htm
>l) by using TransferManager, just need some testing time.  IMHO, this bug
>should not become a major issue blocking object_store branch merge, just
>need several days to fix and address assuming that we have extension.
>Even without extension, I personally think that this definitely can be
>resolved in master with a simple bug fix.
>
>Thanks
>-min
>
>

Re: [DISCUSS]Object_Store design: S3TemplateDownloader Implementation Issues

Posted by Min Chen <mi...@citrix.com>.

John,

I am trying to fix issue #2 mentioned here to handle multi-part upload using TransferManager, but ran into the following issue
"2013-06-04 23:06:52,626 INFO [amazonaws.http.AmazonHttpClient] (s3-transfer-manager-worker-1:) Unable to execute HTTP request: peer not authenticated
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated"

This is part of modified code that caused this exception:

        InputStream in = !chunked ? new BufferedInputStream(request.getResponseBodyAsStream())

                    : new ChunkedInputStream(request.getResponseBodyAsStream());


            s_logger.info("Starting download from " + getDownloadUrl() + " to s3 bucket " + s3.getBucketName()

                    + " remoteSize=" + remoteSize + " , max size=" + maxTemplateSizeInByte);


            Date start = new Date();

            // compute s3 key

            s3Key = join(asList(installPath, fileName), S3Utils.SEPARATOR);


            // multi-part upload using S3 api to handle > 5G input stream

            AWSCredentials myCredentials = new BasicAWSCredentials(s3.getAccessKey(), s3.getSecretKey());

            TransferManager tm = new TransferManager(myCredentials);


            // download using S3 API

            ObjectMetadata metadata = new ObjectMetadata();

            metadata.setContentLength(remoteSize);

            PutObjectRequest putObjectRequest = new PutObjectRequest(s3.getBucketName(), s3Key, in, metadata)

                    .withStorageClass(StorageClass.ReducedRedundancy);

            // register progress listenser

            putObjectRequest.setProgressListener(new ProgressListener() {

                @Override

                public void progressChanged(ProgressEvent progressEvent) {

                    // s_logger.debug(progressEvent.getBytesTransfered()

                    // + " number of byte transferd "

                    // + new Date());

                    totalBytes += progressEvent.getBytesTransfered();

                    if (progressEvent.getEventCode() == ProgressEvent.COMPLETED_EVENT_CODE) {

                        s_logger.info("download completed");

                        status = TemplateDownloader.Status.DOWNLOAD_FINISHED;

                    } else if (progressEvent.getEventCode() == ProgressEvent.FAILED_EVENT_CODE) {

                        status = TemplateDownloader.Status.UNRECOVERABLE_ERROR;

                    } else if (progressEvent.getEventCode() == ProgressEvent.CANCELED_EVENT_CODE) {

                        status = TemplateDownloader.Status.ABORTED;

                    } else {

                        status = TemplateDownloader.Status.IN_PROGRESS;

                    }

                }


            });

            // TransferManager processes all transfers asynchronously,

            // so this call will return immediately.

            Upload upload = tm.upload(putObjectRequest);


            upload.waitForCompletion();

Can you point out what I am doing wrong here? Previous code of using low-level S3 putObject api does not have this issue.

Thanks
-min

On 6/3/13 11:35 AM, "Min Chen" <mi...@citrix.com>> wrote:

Hi there,

This thread is to address John's review comments on S3TemplateDownloader implementation. From previous thread, there are two major concerns for this class implementation.

1. We have used HttpClient library in this class.  For this comment, I can explain why I need that HttpClient during downloading object to S3. Current our download logic is like this:

-- get object total size and InputStream from a http url by invoking HttpClient library method.
-- invoke S3Utils api to download an InputStream to S3, this is totally S3 api, and get actual object size downloaded on S3 once completion.
-- compare object total size and actual download size to check if they are equal to report any truncation error.

John's concern is on step 1 above. We can get ride of HttpClient library use to get InputStream from an URL, but I don't know how I can easily get the object size from a URL. In previous email, John you mentioned that I can use S3 api getObjectMetaData to get the object size, but my understanding is that that API only applies to the object already in S3. In my flow, I need to get the size of object that is to be downloaded to S3, but not in S3. Willing to hear your suggestion here.

2. John pointed out an issue with current download method implementation in this class, where I have used S3 low-level api PutObjectRequest to put an InputStream to S3, this has a bug that it cannot handle object > 5GB. That is true after reading several S3 documentation on MultiPart upload, sorry that I am not expert on S3 and thus didn't know that earlier when I implemented this method. To change that, it should not take too long to code based on this sample on AWS (http://docs.aws.amazon.com/AmazonS3/latest/dev/HLTrackProgressMPUJava.html) by using TransferManager, just need some testing time.  IMHO, this bug should not become a major issue blocking object_store branch merge, just need several days to fix and address assuming that we have extension. Even without extension, I personally think that this definitely can be resolved in master with a simple bug fix.

Thanks
-min