You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Thomas O'Dowd <tp...@cloudian.com> on 2013/06/05 11:31:06 UTC

Object based Secondary storage.

Hi all,

I'm new here. I'm interested in Cloudstack Secondary storage using S3
object stores. I checked out and built cloudstack today and found the
object_store branch (not built it yet). I haven't done Java since 2004
(mostly erlang/C++/python) so I'm rusty but I know the finer parts of
S3 :-)

Anyway - I'm thinking I can help in my spare time. Any pointers to the
new object store secondary storage design are appreciated. Someone on
IRC already pointed out the merge request mail archive which I've read.
What timezone are the main folks working on in? I'm GMT+9.

Tom.
-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Thomas,

When using TransferManager, as we are in CloudStack, the MD5 hashes are calculated by the Amazon AWS Java client.  It also determines how best to utilize multi-part upload, if at all.  I just want to ensure that folks understand the information below applies when interacting with the HTTP API, but that the Amazon AWS Java client handles most of these details for the developer.

Thanks,
-John

On Jun 6, 2013, at 9:10 PM, Thomas O'Dowd <tp...@cloudian.com> wrote:

> Hi guys,
> 
> The ETAG is an interesting subject. AWS currently maintains 2 different
> types of ETAGS for objects that I know of.
> 
>  a) PUT OBJECT - assigned ETAG will be calculated from the MD5 checksum
> of the data content that you are uploading. When uploading you should
> also always set the Content-MD5 header so that AWS (or other S3 Stores)
> can verify your MD5 checksum against what it receives. The ETAG for such
> objects will be the MD5 checksum of the content for AWS but doesn't have
> to be I guess for other S3 stores. What's important is that AWS will
> reject your upload if the MD5 checksum it calculates is not the same as
> your Content-MD5 header.
> 
>  b) MULTIPART OBJECTS - A multipart object is an object which is
> uploaded using mulitple PUT requests each which uploads some part. Parts
> can be uploaded out of order and in parallel so AWS cannot calculate the
> MD5 checksum for the entire object without actually waiting until all
> parts have been uploaded and finally reprocessing all the data. This
> would be very heavy for various reasons so they don't do this. The ETAG
> therefore can not be calculated from the MD5 checksum of the content
> either. I don't know exactly how AWS calculates their ETAG for multipart
> objects but the ETAG will always take the form of XXXXXXXX-YYY where the
> X part looks like a regular MD5 checksum of sorts and the Y part is the
> number of parts that made up the upload. Therefore you can always tell
> that an object was uploaded using a multipart upload by checking its
> ETAG ends with -YYY. This however may be only true for AWS - other S3
> stores may do it differently. You should just treat the etag as opaque
> really.
> 
> Some more best practices about multipart uploads.
> 1. Always calculate the MD5 checksum of each part and send the
> Content-MD5 header. This way AWS can verify the content of each part as
> you upload it.
> 2. Always retain the ETAG for each part as returned by the response of
> each part upload. You should have an etag for each part you uploaded.
> 3. Refrain from asking the server for a list of parts in order to create
> the final Multipart Upload complete request. Always use your list of
> parts and your list of ETAGS (from point 2). The exception is when you
> are doing recovery after some client crash.
> 
> The main reason for this is that AWS and most other S3 stores are based
> on eventual consistency and the server may not always (but mostly does)
> give you a correct list of parts. The Multipart upload complete request
> allows you to drop parts also so if you ask the server for a list of
> parts and it misses one temporarily, you may end up with an object that
> is missing a part also.
> 
> Btw, shameless plug but Cloudian has very good compatibility with AWS
> and has a community edition version that is free for up to 100TB. I'll
> test against it but you may also like to. You can run it on a single
> node with not much fuss. Feel free to ask me about it offline.
> 
> Anyway hope that helps,
> 
> Tom.
> 
> On Thu, 2013-06-06 at 22:57 +0000, Edison Su wrote:
>> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
>> 
>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
>> Test environment:
>> S3cmd: version: version 1.5.0-alpha1
>> Riak cs:
>> Name        : riak
>> Arch        : x86_64
>> Version     : 1.3.1
>> Release     : 1.el6
>> Size        : 40 M
>> Repo        : installed
>> From repo   : basho-products
>> 
>> The command I used to put:
>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
>> 
>> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
>> 
>> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
>> 
>> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
>> 
>> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
>> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
>> 
>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
>> 
>> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
>> 
>>> -----Original Message-----
>>> From: John Burwell [mailto:jburwell@basho.com]
>>> Sent: Thursday, June 06, 2013 2:03 PM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: Object based Secondary storage.
>>> 
>>> Min,
>>> 
>>> Are you calculating the MD5 or letting the Amazon client do it?
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>>> 
>>>> Thanks Tom. Indeed I have a S3 question that need some advise from
>>>> some S3 experts. To support upload object > 5G, I have used
>>>> TransferManager.upload to upload object to S3, upload went fine and
>>>> object are successfully put to S3. However, later on when I am using
>>>> "s3cmd get <object key>" to retrieve this object, I always got this exception:
>>>> 
>>>> "MD5 signatures do not match: computed=Y, received="X"
>>>> 
>>>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
>>>> uploaded object. We have been using Riak CS for our S3 testing. If I
>>>> changed to not using multi-part upload and directly invoking S3
>>>> putObject, I will not run into this issue. Do you have such experience
>>> before?
>>>> 
>>>> -min
>>>> 
>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>>> 
>>>>> Thanks Min. I've printed out the material and am reading new threads.
>>>>> Can't comment much yet until I understand things a bit more.
>>>>> 
>>>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>>>>> looking forward to playing with the object_store branch and testing
>>>>> it out.
>>>>> 
>>>>> Tom.
>>>>> 
>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>>> Welcome Tom. You can check out this FS
>>>>>> 
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
>>>>>> p+Obj
>>>>>> ec
>>>>>> t+Store+Plugin+Framework for secondary storage architectural work
>>>>>> t+Store+Plugin+done
>>>>>> in
>>>>>> object_store branch.You may also check out the following recent
>>>>>> threads regarding 3 major technical questions raised by community as
>>>>>> well as our answers and clarification.
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3C77
>>>>>> B3
>>>>>> 
>>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3CCD
>>>>>> D2
>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3CCD
>>>>>> D2
>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>>> 
>>>>>> 
>>>>>> That branch is mainly worked on by Edison and me, and we are at PST
>>>>>> timezone.
>>>>>> 
>>>>>> Thanks
>>>>>> -min
>>>>> --
>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>> Fancy 100TB of full featured S3 Storage?
>>>>> Checkout the Cloudian(r) Community Edition!
>>>>> 
>>>> 
>> 
> 
> -- 
> Cloudian KK - http://www.cloudian.com/get-started.html
> Fancy 100TB of full featured S3 Storage?
> Checkout the Cloudian® Community Edition!
> 


Re: Object based Secondary storage.

Posted by Thomas O'Dowd <tp...@cloudian.com>.
Hi guys,

The ETAG is an interesting subject. AWS currently maintains 2 different
types of ETAGS for objects that I know of.

  a) PUT OBJECT - assigned ETAG will be calculated from the MD5 checksum
of the data content that you are uploading. When uploading you should
also always set the Content-MD5 header so that AWS (or other S3 Stores)
can verify your MD5 checksum against what it receives. The ETAG for such
objects will be the MD5 checksum of the content for AWS but doesn't have
to be I guess for other S3 stores. What's important is that AWS will
reject your upload if the MD5 checksum it calculates is not the same as
your Content-MD5 header.

  b) MULTIPART OBJECTS - A multipart object is an object which is
uploaded using mulitple PUT requests each which uploads some part. Parts
can be uploaded out of order and in parallel so AWS cannot calculate the
MD5 checksum for the entire object without actually waiting until all
parts have been uploaded and finally reprocessing all the data. This
would be very heavy for various reasons so they don't do this. The ETAG
therefore can not be calculated from the MD5 checksum of the content
either. I don't know exactly how AWS calculates their ETAG for multipart
objects but the ETAG will always take the form of XXXXXXXX-YYY where the
X part looks like a regular MD5 checksum of sorts and the Y part is the
number of parts that made up the upload. Therefore you can always tell
that an object was uploaded using a multipart upload by checking its
ETAG ends with -YYY. This however may be only true for AWS - other S3
stores may do it differently. You should just treat the etag as opaque
really.

Some more best practices about multipart uploads.
1. Always calculate the MD5 checksum of each part and send the
Content-MD5 header. This way AWS can verify the content of each part as
you upload it.
2. Always retain the ETAG for each part as returned by the response of
each part upload. You should have an etag for each part you uploaded.
3. Refrain from asking the server for a list of parts in order to create
the final Multipart Upload complete request. Always use your list of
parts and your list of ETAGS (from point 2). The exception is when you
are doing recovery after some client crash.

The main reason for this is that AWS and most other S3 stores are based
on eventual consistency and the server may not always (but mostly does)
give you a correct list of parts. The Multipart upload complete request
allows you to drop parts also so if you ask the server for a list of
parts and it misses one temporarily, you may end up with an object that
is missing a part also.

Btw, shameless plug but Cloudian has very good compatibility with AWS
and has a community edition version that is free for up to 100TB. I'll
test against it but you may also like to. You can run it on a single
node with not much fuss. Feel free to ask me about it offline.

Anyway hope that helps,

Tom.

On Thu, 2013-06-06 at 22:57 +0000, Edison Su wrote:
> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
> 
> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
> Test environment:
> S3cmd: version: version 1.5.0-alpha1
> Riak cs:
> Name        : riak
> Arch        : x86_64
> Version     : 1.3.1
> Release     : 1.el6
> Size        : 40 M
> Repo        : installed
> From repo   : basho-products
> 
> The command I used to put:
> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
> 
> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
> 
> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
> 
> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
> 
> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
> 
> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
> 
> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
> 
> > -----Original Message-----
> > From: John Burwell [mailto:jburwell@basho.com]
> > Sent: Thursday, June 06, 2013 2:03 PM
> > To: dev@cloudstack.apache.org
> > Subject: Re: Object based Secondary storage.
> > 
> > Min,
> > 
> > Are you calculating the MD5 or letting the Amazon client do it?
> > 
> > Thanks,
> > -John
> > 
> > On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
> > 
> > > Thanks Tom. Indeed I have a S3 question that need some advise from
> > > some S3 experts. To support upload object > 5G, I have used
> > > TransferManager.upload to upload object to S3, upload went fine and
> > > object are successfully put to S3. However, later on when I am using
> > > "s3cmd get <object key>" to retrieve this object, I always got this exception:
> > >
> > > "MD5 signatures do not match: computed=Y, received="X"
> > >
> > > It seems that Amazon S3 kept a different Md5 sum for the multi-part
> > > uploaded object. We have been using Riak CS for our S3 testing. If I
> > > changed to not using multi-part upload and directly invoking S3
> > > putObject, I will not run into this issue. Do you have such experience
> > before?
> > >
> > > -min
> > >
> > > On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> > >
> > >> Thanks Min. I've printed out the material and am reading new threads.
> > >> Can't comment much yet until I understand things a bit more.
> > >>
> > >> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
> > >> looking forward to playing with the object_store branch and testing
> > >> it out.
> > >>
> > >> Tom.
> > >>
> > >> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
> > >>> Welcome Tom. You can check out this FS
> > >>>
> > >>>
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
> > >>> p+Obj
> > >>> ec
> > >>> t+Store+Plugin+Framework for secondary storage architectural work
> > >>> t+Store+Plugin+done
> > >>> in
> > >>> object_store branch.You may also check out the following recent
> > >>> threads regarding 3 major technical questions raised by community as
> > >>> well as our answers and clarification.
> > >>>
> > >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> > dev/201306.mbox/
> > >>> %3C77
> > >>> B3
> > >>>
> > 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
> > >>>
> > >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> > dev/201306.mbox/
> > >>> %3CCD
> > >>> D2
> > >>> 2955.3DDDC%25min.chen%40citrix.com%3E
> > >>>
> > >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> > dev/201306.mbox/
> > >>> %3CCD
> > >>> D2
> > >>> 300D.3DE0C%25min.chen%40citrix.com%3E
> > >>>
> > >>>
> > >>> That branch is mainly worked on by Edison and me, and we are at PST
> > >>> timezone.
> > >>>
> > >>> Thanks
> > >>> -min
> > >> --
> > >> Cloudian KK - http://www.cloudian.com/get-started.html
> > >> Fancy 100TB of full featured S3 Storage?
> > >> Checkout the Cloudian(r) Community Edition!
> > >>
> > >
> 

-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Min,

Cool.  I just wanted to make sure we weren't compressing the template and template.properties …

Thanks for the clarification,
-John

On Jun 17, 2013, at 12:49 PM, Min Chen <mi...@citrix.com> wrote:

> John,
> 
> 	Let me clarify, we didn't do extra compression before sending to S3. Only
> when user provides a URL pointing to a compressed template during
> registering, we will just download that template to S3 without
> decompressing it afterwards as we did for NFS currently. If the register
> url provided user is not compressed format, we will just send uncompressed
> version to S3.
> 
> 	Thanks
> 	-min
> 
> On 6/17/13 9:45 AM, "John Burwell" <jb...@basho.com> wrote:
> 
>> Min,
>> 
>> Why are objects being compressed before being sent to S3?
>> 
>> Thanks,
>> -John
>> 
>> On Jun 17, 2013, at 12:24 PM, Min Chen <mi...@citrix.com> wrote:
>> 
>>> Hi Tom,
>>> 
>>> 	Thanks for your testing. Glad to hear that multipart is working fine by
>>> using Cloudian. Regarding your questions about .gz template, that
>>> behavior
>>> is as expected. We will upload it to S3 as its .gz format. Only when the
>>> template is used and downloaded to primary storage, we will use staging
>>> area to decompress it.
>>> 	We will look at the bugs you filed and update them accordingly.
>>> 
>>> 	-min
>>> 
>>> On 6/17/13 12:31 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>> 
>>>> Thanks Min - I filed 3 small issues today. I've a couple more but I
>>>> want
>>>> to try and repeat them again before I file them and I've no time right
>>>> now. Please let me know if you need any further detail on any of these.
>>>> 
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3027
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3028
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3030
>>>> 
>>>> An example of the other issues I'm running into are that when I upload
>>>> an .gz template on regular NFS storage, it is automatically
>>>> decompressed
>>>> for me where as with S3 the template remains as a .gz file. Is this
>>>> correct or not? Also, perhaps related but after successfully uploading
>>>> the template to S3 and then trying to start an instance using it, I can
>>>> select it and go all the way to the last screen where I think the
>>>> action
>>>> button says launch instance or something and it fails with a resource
>>>> unreachable error. I'll have to dig up the error later and file the bug
>>>> as my machine got rebooted over the weekend.
>>>> 
>>>> The multipart upload looks like it is working correctly though and I
>>>> can
>>>> verify the checksums etc are correct with what they should be.
>>>> 
>>>> Tom.
>>>> 
>>>> On Fri, 2013-06-14 at 16:55 +0000, Min Chen wrote:
>>>>> HI Tom,
>>>>> 
>>>>> 	You can file JIRA ticket for object_store branch by prefixing your
>>>>> bug
>>>>> with "Object_Store_Refactor" and mentioning that it is using build
>>>>> from
>>>>> object_store. Here is an example bug filed from Sangeetha against
>>>>> object_store branch build:
>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
>>>>> 	If you use devcloud for testing, you may run into an issue where ssvm
>>>>> cannot access public url when you register a template, so register
>>>>> template will fail. You may have to set up internal web server inside
>>>>> devcloud and post template to be registered there to give a URL that
>>>>> devcloud can access. We mainly used devcloud to run our TestNG
>>>>> automation
>>>>> test earlier, and then switched to real hypervisor for real testing.
>>>>> 	Thanks
>>>>> 	-min
>>>>> 
>>>>> On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>>>> 
>>>>>> Edison,
>>>>>> 
>>>>>> I've got devcloud running along with the object_store branch and I've
>>>>>> finally been able to test a bit today.
>>>>>> 
>>>>>> I found some issues (or things that I think are bugs) and would like
>>>>>> to
>>>>>> file a few issues. I know where the bug database is and I have an
>>>>>> account but what is the best way to file bugs against this particular
>>>>>> branch? I guess I can select "Future" as the version? What other way
>>>>> are
>>>>>> feature branches usually identified in issues? Perhaps in the
>>>>>> subject?
>>>>>> Please let me know the preference.
>>>>>> 
>>>>>> Also, can you describe (or point me at a document) what the best way
>>>>>> to
>>>>>> test against the object_store branch is? So far I have been doing the
>>>>>> following but I'm not sure it is the best?
>>>>>> 
>>>>>> a) setup devcloud.
>>>>>> b) stop any instances on devcloud from previous runs
>>>>>>    xe vm-shutdown --multiple
>>>>>> c) check out and update the object_store branch.
>>>>>> d) clean build as described in devcloud doc (ADIDD for short)
>>>>>> e) deploydb (ADIDD)
>>>>>> f) start management console (ADIDD) and wait for it.
>>>>>> g) deploysvr (ADIDD) in another shell.
>>>>>> h) on devcloud machine use xentop to wait for 2 vms to launch.
>>>>>>  (I'm not sure what the nfs vm is used for here??)
>>>>>> i) login on gui -> infra -> secondary and remove nfs secondary
>>>>>> storage
>>>>>> j) add s3 secondary storage (using cache of old secondary storage?)
>>>>>> 
>>>>>> Then rest of testing starts from here... (and also perhaps in step j)
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Tom.
>>>>>> -- 
>>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>>> Fancy 100TB of full featured S3 Storage?
>>>>>> Checkout the Cloudian® Community Edition!
>>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>> Fancy 100TB of full featured S3 Storage?
>>>> Checkout the Cloudian® Community Edition!
>>>> 
>>> 
>> 
> 


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
John,

	Let me clarify, we didn't do extra compression before sending to S3. Only
when user provides a URL pointing to a compressed template during
registering, we will just download that template to S3 without
decompressing it afterwards as we did for NFS currently. If the register
url provided user is not compressed format, we will just send uncompressed
version to S3.

	Thanks
	-min

On 6/17/13 9:45 AM, "John Burwell" <jb...@basho.com> wrote:

>Min,
>
>Why are objects being compressed before being sent to S3?
>
>Thanks,
>-John
>
>On Jun 17, 2013, at 12:24 PM, Min Chen <mi...@citrix.com> wrote:
>
>> Hi Tom,
>> 
>> 	Thanks for your testing. Glad to hear that multipart is working fine by
>> using Cloudian. Regarding your questions about .gz template, that
>>behavior
>> is as expected. We will upload it to S3 as its .gz format. Only when the
>> template is used and downloaded to primary storage, we will use staging
>> area to decompress it.
>> 	We will look at the bugs you filed and update them accordingly.
>> 
>> 	-min
>> 
>> On 6/17/13 12:31 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>> 
>>> Thanks Min - I filed 3 small issues today. I've a couple more but I
>>>want
>>> to try and repeat them again before I file them and I've no time right
>>> now. Please let me know if you need any further detail on any of these.
>>> 
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3027
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3028
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3030
>>> 
>>> An example of the other issues I'm running into are that when I upload
>>> an .gz template on regular NFS storage, it is automatically
>>>decompressed
>>> for me where as with S3 the template remains as a .gz file. Is this
>>> correct or not? Also, perhaps related but after successfully uploading
>>> the template to S3 and then trying to start an instance using it, I can
>>> select it and go all the way to the last screen where I think the
>>>action
>>> button says launch instance or something and it fails with a resource
>>> unreachable error. I'll have to dig up the error later and file the bug
>>> as my machine got rebooted over the weekend.
>>> 
>>> The multipart upload looks like it is working correctly though and I
>>>can
>>> verify the checksums etc are correct with what they should be.
>>> 
>>> Tom.
>>> 
>>> On Fri, 2013-06-14 at 16:55 +0000, Min Chen wrote:
>>>> HI Tom,
>>>> 
>>>> 	You can file JIRA ticket for object_store branch by prefixing your
>>>>bug
>>>> with "Object_Store_Refactor" and mentioning that it is using build
>>>>from
>>>> object_store. Here is an example bug filed from Sangeetha against
>>>> object_store branch build:
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
>>>> 	If you use devcloud for testing, you may run into an issue where ssvm
>>>> cannot access public url when you register a template, so register
>>>> template will fail. You may have to set up internal web server inside
>>>> devcloud and post template to be registered there to give a URL that
>>>> devcloud can access. We mainly used devcloud to run our TestNG
>>>> automation
>>>> test earlier, and then switched to real hypervisor for real testing.
>>>> 	Thanks
>>>> 	-min
>>>> 
>>>> On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>>> 
>>>>> Edison,
>>>>> 
>>>>> I've got devcloud running along with the object_store branch and I've
>>>>> finally been able to test a bit today.
>>>>> 
>>>>> I found some issues (or things that I think are bugs) and would like
>>>>>to
>>>>> file a few issues. I know where the bug database is and I have an
>>>>> account but what is the best way to file bugs against this particular
>>>>> branch? I guess I can select "Future" as the version? What other way
>>>> are
>>>>> feature branches usually identified in issues? Perhaps in the
>>>>>subject?
>>>>> Please let me know the preference.
>>>>> 
>>>>> Also, can you describe (or point me at a document) what the best way
>>>>>to
>>>>> test against the object_store branch is? So far I have been doing the
>>>>> following but I'm not sure it is the best?
>>>>> 
>>>>> a) setup devcloud.
>>>>> b) stop any instances on devcloud from previous runs
>>>>>     xe vm-shutdown --multiple
>>>>> c) check out and update the object_store branch.
>>>>> d) clean build as described in devcloud doc (ADIDD for short)
>>>>> e) deploydb (ADIDD)
>>>>> f) start management console (ADIDD) and wait for it.
>>>>> g) deploysvr (ADIDD) in another shell.
>>>>> h) on devcloud machine use xentop to wait for 2 vms to launch.
>>>>>   (I'm not sure what the nfs vm is used for here??)
>>>>> i) login on gui -> infra -> secondary and remove nfs secondary
>>>>>storage
>>>>> j) add s3 secondary storage (using cache of old secondary storage?)
>>>>> 
>>>>> Then rest of testing starts from here... (and also perhaps in step j)
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Tom.
>>>>> -- 
>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>> Fancy 100TB of full featured S3 Storage?
>>>>> Checkout the Cloudian® Community Edition!
>>>>> 
>>>> 
>>> 
>>> -- 
>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>> Fancy 100TB of full featured S3 Storage?
>>> Checkout the Cloudian® Community Edition!
>>> 
>> 
>


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Min,

Why are objects being compressed before being sent to S3?

Thanks,
-John

On Jun 17, 2013, at 12:24 PM, Min Chen <mi...@citrix.com> wrote:

> Hi Tom,
> 
> 	Thanks for your testing. Glad to hear that multipart is working fine by
> using Cloudian. Regarding your questions about .gz template, that behavior
> is as expected. We will upload it to S3 as its .gz format. Only when the
> template is used and downloaded to primary storage, we will use staging
> area to decompress it.
> 	We will look at the bugs you filed and update them accordingly.
> 
> 	-min
> 
> On 6/17/13 12:31 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> 
>> Thanks Min - I filed 3 small issues today. I've a couple more but I want
>> to try and repeat them again before I file them and I've no time right
>> now. Please let me know if you need any further detail on any of these.
>> 
>> https://issues.apache.org/jira/browse/CLOUDSTACK-3027
>> https://issues.apache.org/jira/browse/CLOUDSTACK-3028
>> https://issues.apache.org/jira/browse/CLOUDSTACK-3030
>> 
>> An example of the other issues I'm running into are that when I upload
>> an .gz template on regular NFS storage, it is automatically decompressed
>> for me where as with S3 the template remains as a .gz file. Is this
>> correct or not? Also, perhaps related but after successfully uploading
>> the template to S3 and then trying to start an instance using it, I can
>> select it and go all the way to the last screen where I think the action
>> button says launch instance or something and it fails with a resource
>> unreachable error. I'll have to dig up the error later and file the bug
>> as my machine got rebooted over the weekend.
>> 
>> The multipart upload looks like it is working correctly though and I can
>> verify the checksums etc are correct with what they should be.
>> 
>> Tom.
>> 
>> On Fri, 2013-06-14 at 16:55 +0000, Min Chen wrote:
>>> HI Tom,
>>> 
>>> 	You can file JIRA ticket for object_store branch by prefixing your bug
>>> with "Object_Store_Refactor" and mentioning that it is using build from
>>> object_store. Here is an example bug filed from Sangeetha against
>>> object_store branch build:
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
>>> 	If you use devcloud for testing, you may run into an issue where ssvm
>>> cannot access public url when you register a template, so register
>>> template will fail. You may have to set up internal web server inside
>>> devcloud and post template to be registered there to give a URL that
>>> devcloud can access. We mainly used devcloud to run our TestNG
>>> automation
>>> test earlier, and then switched to real hypervisor for real testing.
>>> 	Thanks
>>> 	-min
>>> 
>>> On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>> 
>>>> Edison,
>>>> 
>>>> I've got devcloud running along with the object_store branch and I've
>>>> finally been able to test a bit today.
>>>> 
>>>> I found some issues (or things that I think are bugs) and would like to
>>>> file a few issues. I know where the bug database is and I have an
>>>> account but what is the best way to file bugs against this particular
>>>> branch? I guess I can select "Future" as the version? What other way
>>> are
>>>> feature branches usually identified in issues? Perhaps in the subject?
>>>> Please let me know the preference.
>>>> 
>>>> Also, can you describe (or point me at a document) what the best way to
>>>> test against the object_store branch is? So far I have been doing the
>>>> following but I'm not sure it is the best?
>>>> 
>>>> a) setup devcloud.
>>>> b) stop any instances on devcloud from previous runs
>>>>     xe vm-shutdown --multiple
>>>> c) check out and update the object_store branch.
>>>> d) clean build as described in devcloud doc (ADIDD for short)
>>>> e) deploydb (ADIDD)
>>>> f) start management console (ADIDD) and wait for it.
>>>> g) deploysvr (ADIDD) in another shell.
>>>> h) on devcloud machine use xentop to wait for 2 vms to launch.
>>>>   (I'm not sure what the nfs vm is used for here??)
>>>> i) login on gui -> infra -> secondary and remove nfs secondary storage
>>>> j) add s3 secondary storage (using cache of old secondary storage?)
>>>> 
>>>> Then rest of testing starts from here... (and also perhaps in step j)
>>>> 
>>>> Thanks,
>>>> 
>>>> Tom.
>>>> -- 
>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>> Fancy 100TB of full featured S3 Storage?
>>>> Checkout the Cloudian® Community Edition!
>>>> 
>>> 
>> 
>> -- 
>> Cloudian KK - http://www.cloudian.com/get-started.html
>> Fancy 100TB of full featured S3 Storage?
>> Checkout the Cloudian® Community Edition!
>> 
> 


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
Hi Tom,

	Thanks for your testing. Glad to hear that multipart is working fine by
using Cloudian. Regarding your questions about .gz template, that behavior
is as expected. We will upload it to S3 as its .gz format. Only when the
template is used and downloaded to primary storage, we will use staging
area to decompress it.
	We will look at the bugs you filed and update them accordingly.

	-min

On 6/17/13 12:31 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:

>Thanks Min - I filed 3 small issues today. I've a couple more but I want
>to try and repeat them again before I file them and I've no time right
>now. Please let me know if you need any further detail on any of these.
>
>https://issues.apache.org/jira/browse/CLOUDSTACK-3027
>https://issues.apache.org/jira/browse/CLOUDSTACK-3028
>https://issues.apache.org/jira/browse/CLOUDSTACK-3030
>
>An example of the other issues I'm running into are that when I upload
>an .gz template on regular NFS storage, it is automatically decompressed
>for me where as with S3 the template remains as a .gz file. Is this
>correct or not? Also, perhaps related but after successfully uploading
>the template to S3 and then trying to start an instance using it, I can
>select it and go all the way to the last screen where I think the action
>button says launch instance or something and it fails with a resource
>unreachable error. I'll have to dig up the error later and file the bug
>as my machine got rebooted over the weekend.
>
>The multipart upload looks like it is working correctly though and I can
>verify the checksums etc are correct with what they should be.
>
>Tom.
>
>On Fri, 2013-06-14 at 16:55 +0000, Min Chen wrote:
>> HI Tom,
>> 
>> 	You can file JIRA ticket for object_store branch by prefixing your bug
>> with "Object_Store_Refactor" and mentioning that it is using build from
>> object_store. Here is an example bug filed from Sangeetha against
>> object_store branch build:
>> https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
>> 	If you use devcloud for testing, you may run into an issue where ssvm
>> cannot access public url when you register a template, so register
>> template will fail. You may have to set up internal web server inside
>> devcloud and post template to be registered there to give a URL that
>> devcloud can access. We mainly used devcloud to run our TestNG
>>automation
>> test earlier, and then switched to real hypervisor for real testing.
>> 	Thanks
>> 	-min
>> 
>> On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>> 
>> >Edison,
>> >
>> >I've got devcloud running along with the object_store branch and I've
>> >finally been able to test a bit today.
>> >
>> >I found some issues (or things that I think are bugs) and would like to
>> >file a few issues. I know where the bug database is and I have an
>> >account but what is the best way to file bugs against this particular
>> >branch? I guess I can select "Future" as the version? What other way
>>are
>> >feature branches usually identified in issues? Perhaps in the subject?
>> >Please let me know the preference.
>> >
>> >Also, can you describe (or point me at a document) what the best way to
>> >test against the object_store branch is? So far I have been doing the
>> >following but I'm not sure it is the best?
>> >
>> > a) setup devcloud.
>> > b) stop any instances on devcloud from previous runs
>> >      xe vm-shutdown --multiple
>> > c) check out and update the object_store branch.
>> > d) clean build as described in devcloud doc (ADIDD for short)
>> > e) deploydb (ADIDD)
>> > f) start management console (ADIDD) and wait for it.
>> > g) deploysvr (ADIDD) in another shell.
>> > h) on devcloud machine use xentop to wait for 2 vms to launch.
>> >    (I'm not sure what the nfs vm is used for here??)
>> > i) login on gui -> infra -> secondary and remove nfs secondary storage
>> > j) add s3 secondary storage (using cache of old secondary storage?)
>> >
>> >Then rest of testing starts from here... (and also perhaps in step j)
>> >
>> >Thanks,
>> >
>> >Tom.
>> >-- 
>> >Cloudian KK - http://www.cloudian.com/get-started.html
>> >Fancy 100TB of full featured S3 Storage?
>> >Checkout the Cloudian® Community Edition!
>> >
>> 
>
>-- 
>Cloudian KK - http://www.cloudian.com/get-started.html
>Fancy 100TB of full featured S3 Storage?
>Checkout the Cloudian® Community Edition!
>


Re: Object based Secondary storage.

Posted by Thomas O'Dowd <tp...@cloudian.com>.
Thanks Min - I filed 3 small issues today. I've a couple more but I want
to try and repeat them again before I file them and I've no time right
now. Please let me know if you need any further detail on any of these.

https://issues.apache.org/jira/browse/CLOUDSTACK-3027
https://issues.apache.org/jira/browse/CLOUDSTACK-3028
https://issues.apache.org/jira/browse/CLOUDSTACK-3030

An example of the other issues I'm running into are that when I upload
an .gz template on regular NFS storage, it is automatically decompressed
for me where as with S3 the template remains as a .gz file. Is this
correct or not? Also, perhaps related but after successfully uploading
the template to S3 and then trying to start an instance using it, I can
select it and go all the way to the last screen where I think the action
button says launch instance or something and it fails with a resource
unreachable error. I'll have to dig up the error later and file the bug
as my machine got rebooted over the weekend.

The multipart upload looks like it is working correctly though and I can
verify the checksums etc are correct with what they should be.

Tom.

On Fri, 2013-06-14 at 16:55 +0000, Min Chen wrote:
> HI Tom,
> 
> 	You can file JIRA ticket for object_store branch by prefixing your bug
> with "Object_Store_Refactor" and mentioning that it is using build from
> object_store. Here is an example bug filed from Sangeetha against
> object_store branch build:
> https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
> 	If you use devcloud for testing, you may run into an issue where ssvm
> cannot access public url when you register a template, so register
> template will fail. You may have to set up internal web server inside
> devcloud and post template to be registered there to give a URL that
> devcloud can access. We mainly used devcloud to run our TestNG automation
> test earlier, and then switched to real hypervisor for real testing.
> 	Thanks
> 	-min
> 
> On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> 
> >Edison,
> >
> >I've got devcloud running along with the object_store branch and I've
> >finally been able to test a bit today.
> >
> >I found some issues (or things that I think are bugs) and would like to
> >file a few issues. I know where the bug database is and I have an
> >account but what is the best way to file bugs against this particular
> >branch? I guess I can select "Future" as the version? What other way are
> >feature branches usually identified in issues? Perhaps in the subject?
> >Please let me know the preference.
> >
> >Also, can you describe (or point me at a document) what the best way to
> >test against the object_store branch is? So far I have been doing the
> >following but I'm not sure it is the best?
> >
> > a) setup devcloud.
> > b) stop any instances on devcloud from previous runs
> >      xe vm-shutdown --multiple
> > c) check out and update the object_store branch.
> > d) clean build as described in devcloud doc (ADIDD for short)
> > e) deploydb (ADIDD)
> > f) start management console (ADIDD) and wait for it.
> > g) deploysvr (ADIDD) in another shell.
> > h) on devcloud machine use xentop to wait for 2 vms to launch.
> >    (I'm not sure what the nfs vm is used for here??)
> > i) login on gui -> infra -> secondary and remove nfs secondary storage
> > j) add s3 secondary storage (using cache of old secondary storage?)
> >
> >Then rest of testing starts from here... (and also perhaps in step j)
> >
> >Thanks,
> >
> >Tom.
> >-- 
> >Cloudian KK - http://www.cloudian.com/get-started.html
> >Fancy 100TB of full featured S3 Storage?
> >Checkout the Cloudian® Community Edition!
> >
> 

-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
HI Tom,

	You can file JIRA ticket for object_store branch by prefixing your bug
with "Object_Store_Refactor" and mentioning that it is using build from
object_store. Here is an example bug filed from Sangeetha against
object_store branch build:
https://issues.apache.org/jira/browse/CLOUDSTACK-2528.
	If you use devcloud for testing, you may run into an issue where ssvm
cannot access public url when you register a template, so register
template will fail. You may have to set up internal web server inside
devcloud and post template to be registered there to give a URL that
devcloud can access. We mainly used devcloud to run our TestNG automation
test earlier, and then switched to real hypervisor for real testing.
	Thanks
	-min

On 6/14/13 1:46 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:

>Edison,
>
>I've got devcloud running along with the object_store branch and I've
>finally been able to test a bit today.
>
>I found some issues (or things that I think are bugs) and would like to
>file a few issues. I know where the bug database is and I have an
>account but what is the best way to file bugs against this particular
>branch? I guess I can select "Future" as the version? What other way are
>feature branches usually identified in issues? Perhaps in the subject?
>Please let me know the preference.
>
>Also, can you describe (or point me at a document) what the best way to
>test against the object_store branch is? So far I have been doing the
>following but I'm not sure it is the best?
>
> a) setup devcloud.
> b) stop any instances on devcloud from previous runs
>      xe vm-shutdown --multiple
> c) check out and update the object_store branch.
> d) clean build as described in devcloud doc (ADIDD for short)
> e) deploydb (ADIDD)
> f) start management console (ADIDD) and wait for it.
> g) deploysvr (ADIDD) in another shell.
> h) on devcloud machine use xentop to wait for 2 vms to launch.
>    (I'm not sure what the nfs vm is used for here??)
> i) login on gui -> infra -> secondary and remove nfs secondary storage
> j) add s3 secondary storage (using cache of old secondary storage?)
>
>Then rest of testing starts from here... (and also perhaps in step j)
>
>Thanks,
>
>Tom.
>-- 
>Cloudian KK - http://www.cloudian.com/get-started.html
>Fancy 100TB of full featured S3 Storage?
>Checkout the Cloudian® Community Edition!
>


Re: Object based Secondary storage.

Posted by Thomas O'Dowd <tp...@cloudian.com>.
Edison,

I've got devcloud running along with the object_store branch and I've
finally been able to test a bit today.

I found some issues (or things that I think are bugs) and would like to
file a few issues. I know where the bug database is and I have an
account but what is the best way to file bugs against this particular
branch? I guess I can select "Future" as the version? What other way are
feature branches usually identified in issues? Perhaps in the subject?
Please let me know the preference.

Also, can you describe (or point me at a document) what the best way to
test against the object_store branch is? So far I have been doing the
following but I'm not sure it is the best?

 a) setup devcloud.
 b) stop any instances on devcloud from previous runs
      xe vm-shutdown --multiple
 c) check out and update the object_store branch.
 d) clean build as described in devcloud doc (ADIDD for short)
 e) deploydb (ADIDD)
 f) start management console (ADIDD) and wait for it.
 g) deploysvr (ADIDD) in another shell.
 h) on devcloud machine use xentop to wait for 2 vms to launch.
    (I'm not sure what the nfs vm is used for here??)
 i) login on gui -> infra -> secondary and remove nfs secondary storage
 j) add s3 secondary storage (using cache of old secondary storage?)

Then rest of testing starts from here... (and also perhaps in step j)

Thanks,

Tom.
-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Edison,

It appears that the S3 clients have a quirk in their behavior for multi-part uploads.  I have created a defect for Riak CS (https://github.com/basho/riak_cs/issues/585).  Once a patch has been merged merged into master, I will provide instructions for building from source (it is very easy), and we can move forward.  Until the path is available, I recommend configuring TransferManager with a high multi-part upload threshold (4.5 GB should do the trick) and using files less than the size of threshold until the Riak CS patch becomes available.

Thanks  for running down this issue.  As I said, it is unexpected behavior, but in discussing it, it seems like the quickest remedy is to have Riak CS emulate the quirk.  
-John

On Jun 7, 2013, at 1:23 PM, Edison Su <Ed...@citrix.com> wrote:

> 
> 
>> -----Original Message-----
>> From: John Burwell [mailto:jburwell@basho.com]
>> Sent: Friday, June 07, 2013 7:54 AM
>> To: dev@cloudstack.apache.org
>> Cc: Kelly McLaughlin
>> Subject: Re: Object based Secondary storage.
>> 
>> Thomas,
>> 
>> The AWS API explicitly states the ETag is not guaranteed to be an integrity
>> hash [1].  According to RFC 2616 [2], clients should not infer any meaning to
>> the content of an ETag.  Essentially, it is an opaque version identifier which
>> should only be compared for equality to another ETag value to detect a
>> resource change.  As such, I agree with your assessment that s3cmd is
>> making an invalid assumption regarding the value of the ETag.
> 
> 
> Not only s3cmd, but Amazon S3 java SDK also makes the "invalid" assumption.
> What's your opinion to solve the SDK incompatibility issue? 
> 
>> 
>> Min, could you please send the stack trace you receiving from
>> TransferManager?  Also, could send a reference to the code in the Git repo?
>> With that information, we can start run down the source of the problem.
>> 
>> Thanks,
>> -John
>> 
>> [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
>> [2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
>> 
>> On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd <tp...@cloudian.com>
>> wrote:
>> 
>>> Min,
>>> 
>>> This looks like an s3cmd problem. I just downloaded the latest s3cmd
>>> to check the source code.
>>> 
>>> In S3/FileLists.py:
>>> 
>>>       compare_md5 = 'md5' in cfg.sync_checks
>>>       # Multipart-uploaded files don't have a valid md5 sum - it ends
>>> with "...-nn"
>>>       if compare_md5:
>>>           if (src_remote == True and src_list[file]['md5'].find("-")
>>>> = 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
>>> 
>>> Basically, s3cmd is trying to verify that the checksum of the data
>>> that it downloads is the same as the etag unless the etag ends with "-YYY".
>>> This is an AWS convention (as I mentioned in an earlier mail) so it
>>> works but it seems that RiakCS has a different ETAG format which
>>> doesn't match -YYY so s3cmd assumes the other type of ETAG which is
>>> the same as the MD5 checksum. For RiakCS however, this is not the
>>> case. This is why you get the checksum error.
>>> 
>>> Chances are that Riak is doing the right thing here and the data file
>>> will be the same as what you uploaded. You could change the s3cmd code
>>> to be more lenient for Riak. The Basho guys might either like to
>>> change their format or talk to the different tool vendors about
>>> changing the tools to work with Riak. For Cloudian, we choose to try
>>> to keep it similar to AWS so we could avoid stuff like this.
>>> 
>>> Tom.
>>> 
>>> On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
>>>> John,
>>>> We are not able to successfully download file that was uploaded to Riak
>> CS with TransferManager using S3cmd. Same error as we encountered using
>> amazon s3 java client due to the incompatible ETAG format ( - and _
>> difference).
>>>> 
>>>> Thanks
>>>> -min
>>>> 
>>>> 
>>>> 
>>>> On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com> wrote:
>>>> 
>>>>> Edison,
>>>>> 
>>>>> Riak CS and S3 seed their hashes differently -- causing the form to
>> appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding
>> which explains why the ETag values contain "-"s instead of "_"s.  From a client
>> perspective, the ETags are treated as opaque strings that are passed through
>> to the server for processing and compared strictly for equality.  Therefore,
>> the form of the hash will not cause the client to choke, and the Riak CS
>> behavior you are seeing is S3 API compatible (see
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for
>> more details).
>>>>> 
>>>>> Were you able to successfully download the file from Riak CS using
>> s3cmd?
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> 
>>>>> On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>> 
>>>>>> The Etag created by both RIAK CS and Amazon S3 seems a little bit
>> different, in case of multi part upload.
>>>>>> 
>>>>>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
>>>>>> Test environment:
>>>>>> S3cmd: version: version 1.5.0-alpha1 Riak cs:
>>>>>> Name        : riak
>>>>>> Arch        : x86_64
>>>>>> Version     : 1.3.1
>>>>>> Release     : 1.el6
>>>>>> Size        : 40 M
>>>>>> Repo        : installed
>>>>>> From repo   : basho-products
>>>>>> 
>>>>>> The command I used to put:
>>>>>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v
>>>>>> -d
>>>>>> 
>>>>>> The etag created for the file, when using Riak CS is
>>>>>> WxEUkiQzTWm_2C8A92fLQg==
>>>>>> 
>>>>>> EBUG: Sending request method_string='POST',
>>>>>> uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-
>> 1/test?upl
>>>>>> oadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length':
>> '309',
>>>>>> 'Authorization': 'AWS
>>>>>> OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-
>> date':
>>>>>> 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
>>>>>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun
>>>>>> 2013 22:40:09 GMT', 'content-length': '326', 'content-type':
>>>>>> 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data':
>>>>>> '<?xml version="1.0"
>>>>>> encoding="UTF-8"?><CompleteMultipartUploadResult
>>>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
>> 01/"><Location>http://im
>>>>>> agestore.s3.amazonaws.com/tmpl/1/1/routing-
>> 1/test</Location><Bucket
>>>>>>> imagestore</Bucket><Key>tmpl/1/1/routing-
>> 1/test</Key><ETag>kfDkh7Q
>>>>>> _QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
>>>>>> 
>>>>>> While the etag created by Amazon S3 is:
>>>>>> &quot;70e1860be687d43c039873adef4280f2-3&quot;
>>>>>> 
>>>>>> DEBUG: Sending request method_string='POST',
>>>>>> 
>> uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfd
>>>>>> 
>> fdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--
>> '
>>>>>> ,
>>>>>> DEBUG: Response: {'status': 200, 'headers': {, 'server':
>>>>>> 'AmazonS3', 'transfer-encoding': 'chunked', 'connection':
>>>>>> 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99',
>>>>>> 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013
>>>>>> 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK',
>>>>>> 'data': '<?xml version="1.0"
>>>>>> encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult
>>>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
>> 01/"><Location>http://fd
>>>>>> 
>> fdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&q
>>>>>> uot;70e1860be687d43c039873adef4280f2-
>> 3&quot;</ETag></CompleteMultip
>>>>>> artUploadResult>'}
>>>>>> 
>>>>>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only
>> "_" (underscore) on Riak cs.
>>>>>> 
>>>>>> Do you know the reason? What should we need to do to make it
>> compatible with Amazon S3 SDK?
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: John Burwell [mailto:jburwell@basho.com]
>>>>>>> Sent: Thursday, June 06, 2013 2:03 PM
>>>>>>> To: dev@cloudstack.apache.org
>>>>>>> Subject: Re: Object based Secondary storage.
>>>>>>> 
>>>>>>> Min,
>>>>>>> 
>>>>>>> Are you calculating the MD5 or letting the Amazon client do it?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>>>>>>> 
>>>>>>>> Thanks Tom. Indeed I have a S3 question that need some advise
>>>>>>>> from some S3 experts. To support upload object > 5G, I have used
>>>>>>>> TransferManager.upload to upload object to S3, upload went fine
>>>>>>>> and object are successfully put to S3. However, later on when I
>>>>>>>> am using "s3cmd get <object key>" to retrieve this object, I always
>> got this exception:
>>>>>>>> 
>>>>>>>> "MD5 signatures do not match: computed=Y, received="X"
>>>>>>>> 
>>>>>>>> It seems that Amazon S3 kept a different Md5 sum for the
>>>>>>>> multi-part uploaded object. We have been using Riak CS for our S3
>>>>>>>> testing. If I changed to not using multi-part upload and directly
>>>>>>>> invoking S3 putObject, I will not run into this issue. Do you
>>>>>>>> have such experience
>>>>>>> before?
>>>>>>>> 
>>>>>>>> -min
>>>>>>>> 
>>>>>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com>
>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Min. I've printed out the material and am reading new
>> threads.
>>>>>>>>> Can't comment much yet until I understand things a bit more.
>>>>>>>>> 
>>>>>>>>> Meanwhile, feel free to hit me up with any S3 questions you
>>>>>>>>> have. I'm looking forward to playing with the object_store
>>>>>>>>> branch and testing it out.
>>>>>>>>> 
>>>>>>>>> Tom.
>>>>>>>>> 
>>>>>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>>>>>>> Welcome Tom. You can check out this FS
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Bac
>>>>>>> ku
>>>>>>>>>> p+Obj
>>>>>>>>>> ec
>>>>>>>>>> t+Store+Plugin+Framework for secondary storage architectural
>>>>>>>>>> t+Store+Plugin+work done
>>>>>>>>>> in
>>>>>>>>>> object_store branch.You may also check out the following recent
>>>>>>>>>> threads regarding 3 major technical questions raised by
>>>>>>>>>> community as well as our answers and clarification.
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3C77
>>>>>>>>>> B3
>>>>>>>>>> 
>>>>>>> 
>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3CCD
>>>>>>>>>> D2
>>>>>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3CCD
>>>>>>>>>> D2
>>>>>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> That branch is mainly worked on by Edison and me, and we are at
>>>>>>>>>> PST timezone.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> -min
>>>>>>>>> --
>>>>>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>>>>>> Fancy 100TB of full featured S3 Storage?
>>>>>>>>> Checkout the Cloudian(r) Community Edition!
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>> Fancy 100TB of full featured S3 Storage?
>>> Checkout the Cloudian(r) Community Edition!
>>> 
> 


RE: Object based Secondary storage.

Posted by Edison Su <Ed...@citrix.com>.

> -----Original Message-----
> From: John Burwell [mailto:jburwell@basho.com]
> Sent: Friday, June 07, 2013 7:54 AM
> To: dev@cloudstack.apache.org
> Cc: Kelly McLaughlin
> Subject: Re: Object based Secondary storage.
> 
> Thomas,
> 
> The AWS API explicitly states the ETag is not guaranteed to be an integrity
> hash [1].  According to RFC 2616 [2], clients should not infer any meaning to
> the content of an ETag.  Essentially, it is an opaque version identifier which
> should only be compared for equality to another ETag value to detect a
> resource change.  As such, I agree with your assessment that s3cmd is
> making an invalid assumption regarding the value of the ETag.


Not only s3cmd, but Amazon S3 java SDK also makes the "invalid" assumption.
What's your opinion to solve the SDK incompatibility issue? 

> 
> Min, could you please send the stack trace you receiving from
> TransferManager?  Also, could send a reference to the code in the Git repo?
> With that information, we can start run down the source of the problem.
> 
> Thanks,
> -John
> 
> [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
> [2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
> 
> On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd <tp...@cloudian.com>
> wrote:
> 
> > Min,
> >
> > This looks like an s3cmd problem. I just downloaded the latest s3cmd
> > to check the source code.
> >
> > In S3/FileLists.py:
> >
> >        compare_md5 = 'md5' in cfg.sync_checks
> >        # Multipart-uploaded files don't have a valid md5 sum - it ends
> > with "...-nn"
> >        if compare_md5:
> >            if (src_remote == True and src_list[file]['md5'].find("-")
> >> = 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
> >
> > Basically, s3cmd is trying to verify that the checksum of the data
> > that it downloads is the same as the etag unless the etag ends with "-YYY".
> > This is an AWS convention (as I mentioned in an earlier mail) so it
> > works but it seems that RiakCS has a different ETAG format which
> > doesn't match -YYY so s3cmd assumes the other type of ETAG which is
> > the same as the MD5 checksum. For RiakCS however, this is not the
> > case. This is why you get the checksum error.
> >
> > Chances are that Riak is doing the right thing here and the data file
> > will be the same as what you uploaded. You could change the s3cmd code
> > to be more lenient for Riak. The Basho guys might either like to
> > change their format or talk to the different tool vendors about
> > changing the tools to work with Riak. For Cloudian, we choose to try
> > to keep it similar to AWS so we could avoid stuff like this.
> >
> > Tom.
> >
> > On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
> >> John,
> >>  We are not able to successfully download file that was uploaded to Riak
> CS with TransferManager using S3cmd. Same error as we encountered using
> amazon s3 java client due to the incompatible ETAG format ( - and _
> difference).
> >>
> >> Thanks
> >> -min
> >>
> >>
> >>
> >> On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com> wrote:
> >>
> >>> Edison,
> >>>
> >>> Riak CS and S3 seed their hashes differently -- causing the form to
> appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding
> which explains why the ETag values contain "-"s instead of "_"s.  From a client
> perspective, the ETags are treated as opaque strings that are passed through
> to the server for processing and compared strictly for equality.  Therefore,
> the form of the hash will not cause the client to choke, and the Riak CS
> behavior you are seeing is S3 API compatible (see
> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for
> more details).
> >>>
> >>> Were you able to successfully download the file from Riak CS using
> s3cmd?
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>>
> >>> On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:
> >>>
> >>>> The Etag created by both RIAK CS and Amazon S3 seems a little bit
> different, in case of multi part upload.
> >>>>
> >>>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
> >>>> Test environment:
> >>>> S3cmd: version: version 1.5.0-alpha1 Riak cs:
> >>>> Name        : riak
> >>>> Arch        : x86_64
> >>>> Version     : 1.3.1
> >>>> Release     : 1.el6
> >>>> Size        : 40 M
> >>>> Repo        : installed
> >>>> From repo   : basho-products
> >>>>
> >>>> The command I used to put:
> >>>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v
> >>>> -d
> >>>>
> >>>> The etag created for the file, when using Riak CS is
> >>>> WxEUkiQzTWm_2C8A92fLQg==
> >>>>
> >>>> EBUG: Sending request method_string='POST',
> >>>> uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-
> 1/test?upl
> >>>> oadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length':
> '309',
> >>>> 'Authorization': 'AWS
> >>>> OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-
> date':
> >>>> 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
> >>>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun
> >>>> 2013 22:40:09 GMT', 'content-length': '326', 'content-type':
> >>>> 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data':
> >>>> '<?xml version="1.0"
> >>>> encoding="UTF-8"?><CompleteMultipartUploadResult
> >>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
> 01/"><Location>http://im
> >>>> agestore.s3.amazonaws.com/tmpl/1/1/routing-
> 1/test</Location><Bucket
> >>>> >imagestore</Bucket><Key>tmpl/1/1/routing-
> 1/test</Key><ETag>kfDkh7Q
> >>>> _QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
> >>>>
> >>>> While the etag created by Amazon S3 is:
> >>>> &quot;70e1860be687d43c039873adef4280f2-3&quot;
> >>>>
> >>>> DEBUG: Sending request method_string='POST',
> >>>>
> uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfd
> >>>>
> fdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--
> '
> >>>> ,
> >>>> DEBUG: Response: {'status': 200, 'headers': {, 'server':
> >>>> 'AmazonS3', 'transfer-encoding': 'chunked', 'connection':
> >>>> 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99',
> >>>> 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013
> >>>> 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK',
> >>>> 'data': '<?xml version="1.0"
> >>>> encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult
> >>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
> 01/"><Location>http://fd
> >>>>
> fdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&q
> >>>> uot;70e1860be687d43c039873adef4280f2-
> 3&quot;</ETag></CompleteMultip
> >>>> artUploadResult>'}
> >>>>
> >>>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only
> "_" (underscore) on Riak cs.
> >>>>
> >>>> Do you know the reason? What should we need to do to make it
> compatible with Amazon S3 SDK?
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: John Burwell [mailto:jburwell@basho.com]
> >>>>> Sent: Thursday, June 06, 2013 2:03 PM
> >>>>> To: dev@cloudstack.apache.org
> >>>>> Subject: Re: Object based Secondary storage.
> >>>>>
> >>>>> Min,
> >>>>>
> >>>>> Are you calculating the MD5 or letting the Amazon client do it?
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>>
> >>>>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
> >>>>>
> >>>>>> Thanks Tom. Indeed I have a S3 question that need some advise
> >>>>>> from some S3 experts. To support upload object > 5G, I have used
> >>>>>> TransferManager.upload to upload object to S3, upload went fine
> >>>>>> and object are successfully put to S3. However, later on when I
> >>>>>> am using "s3cmd get <object key>" to retrieve this object, I always
> got this exception:
> >>>>>>
> >>>>>> "MD5 signatures do not match: computed=Y, received="X"
> >>>>>>
> >>>>>> It seems that Amazon S3 kept a different Md5 sum for the
> >>>>>> multi-part uploaded object. We have been using Riak CS for our S3
> >>>>>> testing. If I changed to not using multi-part upload and directly
> >>>>>> invoking S3 putObject, I will not run into this issue. Do you
> >>>>>> have such experience
> >>>>> before?
> >>>>>>
> >>>>>> -min
> >>>>>>
> >>>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com>
> wrote:
> >>>>>>
> >>>>>>> Thanks Min. I've printed out the material and am reading new
> threads.
> >>>>>>> Can't comment much yet until I understand things a bit more.
> >>>>>>>
> >>>>>>> Meanwhile, feel free to hit me up with any S3 questions you
> >>>>>>> have. I'm looking forward to playing with the object_store
> >>>>>>> branch and testing it out.
> >>>>>>>
> >>>>>>> Tom.
> >>>>>>>
> >>>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
> >>>>>>>> Welcome Tom. You can check out this FS
> >>>>>>>>
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Bac
> >>>>> ku
> >>>>>>>> p+Obj
> >>>>>>>> ec
> >>>>>>>> t+Store+Plugin+Framework for secondary storage architectural
> >>>>>>>> t+Store+Plugin+work done
> >>>>>>>> in
> >>>>>>>> object_store branch.You may also check out the following recent
> >>>>>>>> threads regarding 3 major technical questions raised by
> >>>>>>>> community as well as our answers and clarification.
> >>>>>>>>
> >>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>>>> dev/201306.mbox/
> >>>>>>>> %3C77
> >>>>>>>> B3
> >>>>>>>>
> >>>>>
> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
> >>>>>>>>
> >>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>>>> dev/201306.mbox/
> >>>>>>>> %3CCD
> >>>>>>>> D2
> >>>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
> >>>>>>>>
> >>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>>>> dev/201306.mbox/
> >>>>>>>> %3CCD
> >>>>>>>> D2
> >>>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> That branch is mainly worked on by Edison and me, and we are at
> >>>>>>>> PST timezone.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> -min
> >>>>>>> --
> >>>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
> >>>>>>> Fancy 100TB of full featured S3 Storage?
> >>>>>>> Checkout the Cloudian(r) Community Edition!
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >
> > --
> > Cloudian KK - http://www.cloudian.com/get-started.html
> > Fancy 100TB of full featured S3 Storage?
> > Checkout the Cloudian(r) Community Edition!
> >


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
Hi John,
Although AWS API states the ETAG is not guaranteed to be an integrity hash, its internal code assumes a special format of ETAG for object uploaded through multi-part, like TransferManager. This is reflected in AmazonS3Client's api implementation "getObject" below:

     /* (non-Javadoc)

     * @see com.amazonaws.services.s3.AmazonS3#getObject(com.amazonaws.services.s3.model.GetObjectRequest, java.io.File)

     */

    public ObjectMetadata getObject(GetObjectRequest getObjectRequest, File destinationFile)

            throws AmazonClientException, AmazonServiceException {

        assertParameterNotNull(destinationFile,

                "The destination file parameter must be specified when downloading an object directly to a file");


        S3Object s3Object = getObject(getObjectRequest);

        // getObject can return null if constraints were specified but not met

        if(s3Object==null)return null;


        ServiceUtils.downloadObjectToFile(s3Object, destinationFile,(getObjectRequest.getRange()==null));


        return s3Object.getObjectMetadata();

    }

And In ServiceUtils.downloadObjectToFile, it determines whether an ETAG is generated from multipart upload or not through the following routine:


    /**

     * Returns true if the specified ETag was from a multipart upload.

     *

     * @param eTag

     *            The ETag to test.

     *

     * @return True if the specified ETag was from a multipart upload, otherwise

     *         false it if belongs to an object that was uploaded in a single

     *         part.

     */

    public static boolean isMultipartUploadETag(String eTag) {

        return eTag.contains("-");

    }

As you can see, it assumes that multipart upload ETAG should contain "-", not underscore "_".  For RIAK CS, the ETAG generated for my S3 object uploaded through TransferManager does not follow this convention, thus that check failed, and then failed integrity check since that ETAG is not actual MD5sum, specifically, reflected in the following code snippet from ServiceUtils.downloadObjectToFile:


        try {

            // Multipart Uploads don't have an MD5 calculated on the service side

            if (ServiceUtils.isMultipartUploadETag(s3Object.getObjectMetadata().getETag()) == false) {

                clientSideHash = Md5Utils.computeMD5Hash(new FileInputStream(destinationFile));

                serverSideHash = BinaryUtils.fromHex(s3Object.getObjectMetadata().getETag());

            }

        } catch (Exception e) {

            log.warn("Unable to calculate MD5 hash to validate download: " + e.getMessage(), e);

        }


        if (performIntegrityCheck && clientSideHash != null && serverSideHash != null && !Arrays.equals(clientSideHash, serverSideHash)) {

            throw new AmazonClientException("Unable to verify integrity of data download.  " +

                    "Client calculated content hash didn't match hash calculated by Amazon S3.  " +

                    "The data stored in '" + destinationFile.getAbsolutePath() + "' may be corrupt.");

        }

If you want to check how we upload the file to RIAK CS using multi-part upload, you can check the code at Git repo: https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=core/src/com/cloud/storage/template/S3TemplateDownloader.java;h=ca0df5d515e900c5313ccb14e962aa72c0785b84;hb=refs/heads/object_store.

Thanks
-min


On 6/7/13 7:53 AM, "John Burwell" <jb...@basho.com>> wrote:

Thomas,

The AWS API explicitly states the ETag is not guaranteed to be an integrity hash [1].  According to RFC 2616 [2], clients should not infer any meaning to the content of an ETag.  Essentially, it is an opaque version identifier which should only be compared for equality to another ETag value to detect a resource change.  As such, I agree with your assessment that s3cmd is making an invalid assumption regarding the value of the ETag.

Min, could you please send the stack trace you receiving from TransferManager?  Also, could send a reference to the code in the Git repo?  With that information, we can start run down the source of the problem.

Thanks,
-John

[1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
[2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd <tp...@cloudian.com>> wrote:

Min,
This looks like an s3cmd problem. I just downloaded the latest s3cmd to
check the source code.
In S3/FileLists.py:
        compare_md5 = 'md5' in cfg.sync_checks
        # Multipart-uploaded files don't have a valid md5 sum - it ends
with "...-nn"
        if compare_md5:
            if (src_remote == True and src_list[file]['md5'].find("-")
= 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
Basically, s3cmd is trying to verify that the checksum of the data that
it downloads is the same as the etag unless the etag ends with "-YYY".
This is an AWS convention (as I mentioned in an earlier mail) so it
works but it seems that RiakCS has a different ETAG format which doesn't
match -YYY so s3cmd assumes the other type of ETAG which is the same as
the MD5 checksum. For RiakCS however, this is not the case. This is why
you get the checksum error.
Chances are that Riak is doing the right thing here and the data file
will be the same as what you uploaded. You could change the s3cmd code
to be more lenient for Riak. The Basho guys might either like to change
their format or talk to the different tool vendors about changing the
tools to work with Riak. For Cloudian, we choose to try to keep it
similar to AWS so we could avoid stuff like this.
Tom.
On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
John,
  We are not able to successfully download file that was uploaded to Riak CS with TransferManager using S3cmd. Same error as we encountered using amazon s3 java client due to the incompatible ETAG format ( - and _ difference).
Thanks
-min
On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com>> wrote:
Edison,
Riak CS and S3 seed their hashes differently -- causing the form to appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding which explains why the ETag values contain "-"s instead of "_"s.  From a client perspective, the ETags are treated as opaque strings that are passed through to the server for processing and compared strictly for equality.  Therefore, the form of the hash will not cause the client to choke, and the Riak CS behavior you are seeing is S3 API compatible (see http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more details).
Were you able to successfully download the file from Riak CS using s3cmd?
Thanks,
-John
On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com>> wrote:
The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
Test environment:
S3cmd: version: version 1.5.0-alpha1
Riak cs:
Name        : riak
Arch        : x86_64
Version     : 1.3.1
Release     : 1.el6
Size        : 40 M
Repo        : installed
>From repo   : basho-products
The command I used to put:
s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--',
DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs.
Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
-----Original Message-----
From: John Burwell [mailto:jburwell@basho.com]
Sent: Thursday, June 06, 2013 2:03 PM
To: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>
Subject: Re: Object based Secondary storage.
Min,
Are you calculating the MD5 or letting the Amazon client do it?
Thanks,
-John
On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com>> wrote:
Thanks Tom. Indeed I have a S3 question that need some advise from
some S3 experts. To support upload object > 5G, I have used
TransferManager.upload to upload object to S3, upload went fine and
object are successfully put to S3. However, later on when I am using
"s3cmd get <object key>" to retrieve this object, I always got this exception:
"MD5 signatures do not match: computed=Y, received="X"
It seems that Amazon S3 kept a different Md5 sum for the multi-part
uploaded object. We have been using Riak CS for our S3 testing. If I
changed to not using multi-part upload and directly invoking S3
putObject, I will not run into this issue. Do you have such experience
before?
-min
On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com>> wrote:
Thanks Min. I've printed out the material and am reading new threads.
Can't comment much yet until I understand things a bit more.
Meanwhile, feel free to hit me up with any S3 questions you have. I'm
looking forward to playing with the object_store branch and testing
it out.
Tom.
On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
Welcome Tom. You can check out this FS
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
p+Obj
ec
t+Store+Plugin+Framework for secondary storage architectural work
t+Store+Plugin+done
in
object_store branch.You may also check out the following recent
threads regarding 3 major technical questions raised by community as
well as our answers and clarification.
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3C77
B3
37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3CCD
D2
2955.3DDDC%25min.chen%40citrix.com%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3CCD
D2
300D.3DE0C%25min.chen%40citrix.com%3E
That branch is mainly worked on by Edison and me, and we are at PST
timezone.
Thanks
-min
--
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian(r) Community Edition!
--
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!



Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Thomas,

The AWS API explicitly states the ETag is not guaranteed to be an integrity hash [1].  According to RFC 2616 [2], clients should not infer any meaning to the content of an ETag.  Essentially, it is an opaque version identifier which should only be compared for equality to another ETag value to detect a resource change.  As such, I agree with your assessment that s3cmd is making an invalid assumption regarding the value of the ETag.

Min, could you please send the stack trace you receiving from TransferManager?  Also, could send a reference to the code in the Git repo?  With that information, we can start run down the source of the problem.

Thanks,
-John

[1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
[2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd <tp...@cloudian.com> wrote:

> Min,
> 
> This looks like an s3cmd problem. I just downloaded the latest s3cmd to
> check the source code.
> 
> In S3/FileLists.py:
> 
>        compare_md5 = 'md5' in cfg.sync_checks
>        # Multipart-uploaded files don't have a valid md5 sum - it ends
> with "...-nn"
>        if compare_md5:
>            if (src_remote == True and src_list[file]['md5'].find("-")
>> = 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
> 
> Basically, s3cmd is trying to verify that the checksum of the data that
> it downloads is the same as the etag unless the etag ends with "-YYY".
> This is an AWS convention (as I mentioned in an earlier mail) so it
> works but it seems that RiakCS has a different ETAG format which doesn't
> match -YYY so s3cmd assumes the other type of ETAG which is the same as
> the MD5 checksum. For RiakCS however, this is not the case. This is why
> you get the checksum error.
> 
> Chances are that Riak is doing the right thing here and the data file
> will be the same as what you uploaded. You could change the s3cmd code
> to be more lenient for Riak. The Basho guys might either like to change
> their format or talk to the different tool vendors about changing the
> tools to work with Riak. For Cloudian, we choose to try to keep it
> similar to AWS so we could avoid stuff like this.
> 
> Tom.
> 
> On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
>> John,
>>  We are not able to successfully download file that was uploaded to Riak CS with TransferManager using S3cmd. Same error as we encountered using amazon s3 java client due to the incompatible ETAG format ( - and _ difference).
>> 
>> Thanks
>> -min
>> 
>> 
>> 
>> On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com> wrote:
>> 
>>> Edison,
>>> 
>>> Riak CS and S3 seed their hashes differently -- causing the form to appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding which explains why the ETag values contain "-"s instead of "_"s.  From a client perspective, the ETags are treated as opaque strings that are passed through to the server for processing and compared strictly for equality.  Therefore, the form of the hash will not cause the client to choke, and the Riak CS behavior you are seeing is S3 API compatible (see http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more details).  
>>> 
>>> Were you able to successfully download the file from Riak CS using s3cmd?
>>> 
>>> Thanks,
>>> -John
>>> 
>>> 
>>> On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>> 
>>>> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
>>>> 
>>>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
>>>> Test environment:
>>>> S3cmd: version: version 1.5.0-alpha1
>>>> Riak cs:
>>>> Name        : riak
>>>> Arch        : x86_64
>>>> Version     : 1.3.1
>>>> Release     : 1.el6
>>>> Size        : 40 M
>>>> Repo        : installed
>>>> From repo   : basho-products
>>>> 
>>>> The command I used to put:
>>>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
>>>> 
>>>> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
>>>> 
>>>> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
>>>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
>>>> 
>>>> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
>>>> 
>>>> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
>>>> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
>>>> 
>>>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
>>>> 
>>>> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
>>>> 
>>>>> -----Original Message-----
>>>>> From: John Burwell [mailto:jburwell@basho.com]
>>>>> Sent: Thursday, June 06, 2013 2:03 PM
>>>>> To: dev@cloudstack.apache.org
>>>>> Subject: Re: Object based Secondary storage.
>>>>> 
>>>>> Min,
>>>>> 
>>>>> Are you calculating the MD5 or letting the Amazon client do it?
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>>>>> 
>>>>>> Thanks Tom. Indeed I have a S3 question that need some advise from
>>>>>> some S3 experts. To support upload object > 5G, I have used
>>>>>> TransferManager.upload to upload object to S3, upload went fine and
>>>>>> object are successfully put to S3. However, later on when I am using
>>>>>> "s3cmd get <object key>" to retrieve this object, I always got this exception:
>>>>>> 
>>>>>> "MD5 signatures do not match: computed=Y, received="X"
>>>>>> 
>>>>>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
>>>>>> uploaded object. We have been using Riak CS for our S3 testing. If I
>>>>>> changed to not using multi-part upload and directly invoking S3
>>>>>> putObject, I will not run into this issue. Do you have such experience
>>>>> before?
>>>>>> 
>>>>>> -min
>>>>>> 
>>>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>>>>> 
>>>>>>> Thanks Min. I've printed out the material and am reading new threads.
>>>>>>> Can't comment much yet until I understand things a bit more.
>>>>>>> 
>>>>>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>>>>>>> looking forward to playing with the object_store branch and testing
>>>>>>> it out.
>>>>>>> 
>>>>>>> Tom.
>>>>>>> 
>>>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>>>>> Welcome Tom. You can check out this FS
>>>>>>>> 
>>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
>>>>>>>> p+Obj
>>>>>>>> ec
>>>>>>>> t+Store+Plugin+Framework for secondary storage architectural work
>>>>>>>> t+Store+Plugin+done
>>>>>>>> in
>>>>>>>> object_store branch.You may also check out the following recent
>>>>>>>> threads regarding 3 major technical questions raised by community as
>>>>>>>> well as our answers and clarification.
>>>>>>>> 
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>> dev/201306.mbox/
>>>>>>>> %3C77
>>>>>>>> B3
>>>>>>>> 
>>>>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>>>>> 
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>> dev/201306.mbox/
>>>>>>>> %3CCD
>>>>>>>> D2
>>>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>>>>> 
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>> dev/201306.mbox/
>>>>>>>> %3CCD
>>>>>>>> D2
>>>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>>>>> 
>>>>>>>> 
>>>>>>>> That branch is mainly worked on by Edison and me, and we are at PST
>>>>>>>> timezone.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -min
>>>>>>> --
>>>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>>>> Fancy 100TB of full featured S3 Storage?
>>>>>>> Checkout the Cloudian(r) Community Edition!
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 
> -- 
> Cloudian KK - http://www.cloudian.com/get-started.html
> Fancy 100TB of full featured S3 Storage?
> Checkout the Cloudian® Community Edition!
> 


Re: Object based Secondary storage.

Posted by Thomas O'Dowd <tp...@cloudian.com>.
Min,

This looks like an s3cmd problem. I just downloaded the latest s3cmd to
check the source code.

In S3/FileLists.py:

        compare_md5 = 'md5' in cfg.sync_checks
        # Multipart-uploaded files don't have a valid md5 sum - it ends
with "...-nn"
        if compare_md5:
            if (src_remote == True and src_list[file]['md5'].find("-")
>= 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):

Basically, s3cmd is trying to verify that the checksum of the data that
it downloads is the same as the etag unless the etag ends with "-YYY".
This is an AWS convention (as I mentioned in an earlier mail) so it
works but it seems that RiakCS has a different ETAG format which doesn't
match -YYY so s3cmd assumes the other type of ETAG which is the same as
the MD5 checksum. For RiakCS however, this is not the case. This is why
you get the checksum error.

Chances are that Riak is doing the right thing here and the data file
will be the same as what you uploaded. You could change the s3cmd code
to be more lenient for Riak. The Basho guys might either like to change
their format or talk to the different tool vendors about changing the
tools to work with Riak. For Cloudian, we choose to try to keep it
similar to AWS so we could avoid stuff like this.

Tom.

On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
> John,
>   We are not able to successfully download file that was uploaded to Riak CS with TransferManager using S3cmd. Same error as we encountered using amazon s3 java client due to the incompatible ETAG format ( - and _ difference).
> 
> Thanks
> -min
> 
> 
> 
> On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com> wrote:
> 
> > Edison,
> > 
> > Riak CS and S3 seed their hashes differently -- causing the form to appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding which explains why the ETag values contain "-"s instead of "_"s.  From a client perspective, the ETags are treated as opaque strings that are passed through to the server for processing and compared strictly for equality.  Therefore, the form of the hash will not cause the client to choke, and the Riak CS behavior you are seeing is S3 API compatible (see http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more details).  
> > 
> > Were you able to successfully download the file from Riak CS using s3cmd?
> > 
> > Thanks,
> > -John
> > 
> > 
> > On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:
> > 
> >> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
> >> 
> >> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
> >> Test environment:
> >> S3cmd: version: version 1.5.0-alpha1
> >> Riak cs:
> >> Name        : riak
> >> Arch        : x86_64
> >> Version     : 1.3.1
> >> Release     : 1.el6
> >> Size        : 40 M
> >> Repo        : installed
> >> From repo   : basho-products
> >> 
> >> The command I used to put:
> >> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
> >> 
> >> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
> >> 
> >> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
> >> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
> >> 
> >> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
> >> 
> >> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
> >> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
> >> 
> >> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
> >> 
> >> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
> >> 
> >>> -----Original Message-----
> >>> From: John Burwell [mailto:jburwell@basho.com]
> >>> Sent: Thursday, June 06, 2013 2:03 PM
> >>> To: dev@cloudstack.apache.org
> >>> Subject: Re: Object based Secondary storage.
> >>> 
> >>> Min,
> >>> 
> >>> Are you calculating the MD5 or letting the Amazon client do it?
> >>> 
> >>> Thanks,
> >>> -John
> >>> 
> >>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
> >>> 
> >>>> Thanks Tom. Indeed I have a S3 question that need some advise from
> >>>> some S3 experts. To support upload object > 5G, I have used
> >>>> TransferManager.upload to upload object to S3, upload went fine and
> >>>> object are successfully put to S3. However, later on when I am using
> >>>> "s3cmd get <object key>" to retrieve this object, I always got this exception:
> >>>> 
> >>>> "MD5 signatures do not match: computed=Y, received="X"
> >>>> 
> >>>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
> >>>> uploaded object. We have been using Riak CS for our S3 testing. If I
> >>>> changed to not using multi-part upload and directly invoking S3
> >>>> putObject, I will not run into this issue. Do you have such experience
> >>> before?
> >>>> 
> >>>> -min
> >>>> 
> >>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> >>>> 
> >>>>> Thanks Min. I've printed out the material and am reading new threads.
> >>>>> Can't comment much yet until I understand things a bit more.
> >>>>> 
> >>>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
> >>>>> looking forward to playing with the object_store branch and testing
> >>>>> it out.
> >>>>> 
> >>>>> Tom.
> >>>>> 
> >>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
> >>>>>> Welcome Tom. You can check out this FS
> >>>>>> 
> >>>>>> 
> >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
> >>>>>> p+Obj
> >>>>>> ec
> >>>>>> t+Store+Plugin+Framework for secondary storage architectural work
> >>>>>> t+Store+Plugin+done
> >>>>>> in
> >>>>>> object_store branch.You may also check out the following recent
> >>>>>> threads regarding 3 major technical questions raised by community as
> >>>>>> well as our answers and clarification.
> >>>>>> 
> >>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>> dev/201306.mbox/
> >>>>>> %3C77
> >>>>>> B3
> >>>>>> 
> >>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
> >>>>>> 
> >>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>> dev/201306.mbox/
> >>>>>> %3CCD
> >>>>>> D2
> >>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
> >>>>>> 
> >>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> >>> dev/201306.mbox/
> >>>>>> %3CCD
> >>>>>> D2
> >>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
> >>>>>> 
> >>>>>> 
> >>>>>> That branch is mainly worked on by Edison and me, and we are at PST
> >>>>>> timezone.
> >>>>>> 
> >>>>>> Thanks
> >>>>>> -min
> >>>>> --
> >>>>> Cloudian KK - http://www.cloudian.com/get-started.html
> >>>>> Fancy 100TB of full featured S3 Storage?
> >>>>> Checkout the Cloudian(r) Community Edition!
> >>>>> 
> >>>> 
> >> 
> > 
> 

-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
John,
  We are not able to successfully download file that was uploaded to Riak CS with TransferManager using S3cmd. Same error as we encountered using amazon s3 java client due to the incompatible ETAG format ( - and _ difference).

Thanks
-min



On Jun 6, 2013, at 5:40 PM, "John Burwell" <jb...@basho.com> wrote:

> Edison,
> 
> Riak CS and S3 seed their hashes differently -- causing the form to appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding which explains why the ETag values contain "-"s instead of "_"s.  From a client perspective, the ETags are treated as opaque strings that are passed through to the server for processing and compared strictly for equality.  Therefore, the form of the hash will not cause the client to choke, and the Riak CS behavior you are seeing is S3 API compatible (see http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more details).  
> 
> Were you able to successfully download the file from Riak CS using s3cmd?
> 
> Thanks,
> -John
> 
> 
> On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:
> 
>> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
>> 
>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
>> Test environment:
>> S3cmd: version: version 1.5.0-alpha1
>> Riak cs:
>> Name        : riak
>> Arch        : x86_64
>> Version     : 1.3.1
>> Release     : 1.el6
>> Size        : 40 M
>> Repo        : installed
>> From repo   : basho-products
>> 
>> The command I used to put:
>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
>> 
>> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
>> 
>> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
>> 
>> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
>> 
>> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
>> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
>> 
>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
>> 
>> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
>> 
>>> -----Original Message-----
>>> From: John Burwell [mailto:jburwell@basho.com]
>>> Sent: Thursday, June 06, 2013 2:03 PM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: Object based Secondary storage.
>>> 
>>> Min,
>>> 
>>> Are you calculating the MD5 or letting the Amazon client do it?
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>>> 
>>>> Thanks Tom. Indeed I have a S3 question that need some advise from
>>>> some S3 experts. To support upload object > 5G, I have used
>>>> TransferManager.upload to upload object to S3, upload went fine and
>>>> object are successfully put to S3. However, later on when I am using
>>>> "s3cmd get <object key>" to retrieve this object, I always got this exception:
>>>> 
>>>> "MD5 signatures do not match: computed=Y, received="X"
>>>> 
>>>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
>>>> uploaded object. We have been using Riak CS for our S3 testing. If I
>>>> changed to not using multi-part upload and directly invoking S3
>>>> putObject, I will not run into this issue. Do you have such experience
>>> before?
>>>> 
>>>> -min
>>>> 
>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>>> 
>>>>> Thanks Min. I've printed out the material and am reading new threads.
>>>>> Can't comment much yet until I understand things a bit more.
>>>>> 
>>>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>>>>> looking forward to playing with the object_store branch and testing
>>>>> it out.
>>>>> 
>>>>> Tom.
>>>>> 
>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>>> Welcome Tom. You can check out this FS
>>>>>> 
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
>>>>>> p+Obj
>>>>>> ec
>>>>>> t+Store+Plugin+Framework for secondary storage architectural work
>>>>>> t+Store+Plugin+done
>>>>>> in
>>>>>> object_store branch.You may also check out the following recent
>>>>>> threads regarding 3 major technical questions raised by community as
>>>>>> well as our answers and clarification.
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3C77
>>>>>> B3
>>>>>> 
>>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3CCD
>>>>>> D2
>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>>> 
>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>> dev/201306.mbox/
>>>>>> %3CCD
>>>>>> D2
>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>>> 
>>>>>> 
>>>>>> That branch is mainly worked on by Edison and me, and we are at PST
>>>>>> timezone.
>>>>>> 
>>>>>> Thanks
>>>>>> -min
>>>>> --
>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>> Fancy 100TB of full featured S3 Storage?
>>>>> Checkout the Cloudian(r) Community Edition!
>>>>> 
>>>> 
>> 
> 


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Edison,

Riak CS and S3 seed their hashes differently -- causing the form to appear slightly different.  In particular, Riak CS uses URI-safe base64 encoding which explains why the ETag values contain "-"s instead of "_"s.  From a client perspective, the ETags are treated as opaque strings that are passed through to the server for processing and compared strictly for equality.  Therefore, the form of the hash will not cause the client to choke, and the Riak CS behavior you are seeing is S3 API compatible (see http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more details).  

Were you able to successfully download the file from Riak CS using s3cmd?

Thanks,
-John


On Jun 6, 2013, at 6:57 PM, Edison Su <Ed...@citrix.com> wrote:

> The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.
> 
> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
> Test environment:
> S3cmd: version: version 1.5.0-alpha1
> Riak cs:
> Name        : riak
> Arch        : x86_64
> Version     : 1.3.1
> Release     : 1.el6
> Size        : 40 M
> Repo        : installed
> From repo   : basho-products
> 
> The command I used to put:
> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
> 
> The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
> 
> EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
> 
> While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;
> 
> DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
> DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
> 
> So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 
> 
> Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?
> 
>> -----Original Message-----
>> From: John Burwell [mailto:jburwell@basho.com]
>> Sent: Thursday, June 06, 2013 2:03 PM
>> To: dev@cloudstack.apache.org
>> Subject: Re: Object based Secondary storage.
>> 
>> Min,
>> 
>> Are you calculating the MD5 or letting the Amazon client do it?
>> 
>> Thanks,
>> -John
>> 
>> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>> 
>>> Thanks Tom. Indeed I have a S3 question that need some advise from
>>> some S3 experts. To support upload object > 5G, I have used
>>> TransferManager.upload to upload object to S3, upload went fine and
>>> object are successfully put to S3. However, later on when I am using
>>> "s3cmd get <object key>" to retrieve this object, I always got this exception:
>>> 
>>> "MD5 signatures do not match: computed=Y, received="X"
>>> 
>>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
>>> uploaded object. We have been using Riak CS for our S3 testing. If I
>>> changed to not using multi-part upload and directly invoking S3
>>> putObject, I will not run into this issue. Do you have such experience
>> before?
>>> 
>>> -min
>>> 
>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>>> 
>>>> Thanks Min. I've printed out the material and am reading new threads.
>>>> Can't comment much yet until I understand things a bit more.
>>>> 
>>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>>>> looking forward to playing with the object_store branch and testing
>>>> it out.
>>>> 
>>>> Tom.
>>>> 
>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>> Welcome Tom. You can check out this FS
>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
>>>>> p+Obj
>>>>> ec
>>>>> t+Store+Plugin+Framework for secondary storage architectural work
>>>>> t+Store+Plugin+done
>>>>> in
>>>>> object_store branch.You may also check out the following recent
>>>>> threads regarding 3 major technical questions raised by community as
>>>>> well as our answers and clarification.
>>>>> 
>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>> dev/201306.mbox/
>>>>> %3C77
>>>>> B3
>>>>> 
>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>> 
>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>> dev/201306.mbox/
>>>>> %3CCD
>>>>> D2
>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>> 
>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>> dev/201306.mbox/
>>>>> %3CCD
>>>>> D2
>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>> 
>>>>> 
>>>>> That branch is mainly worked on by Edison and me, and we are at PST
>>>>> timezone.
>>>>> 
>>>>> Thanks
>>>>> -min
>>>> --
>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>> Fancy 100TB of full featured S3 Storage?
>>>> Checkout the Cloudian(r) Community Edition!
>>>> 
>>> 
> 


RE: Object based Secondary storage.

Posted by Edison Su <Ed...@citrix.com>.
The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in case of multi part upload.

Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
Test environment:
S3cmd: version: version 1.5.0-alpha1
Riak cs:
Name        : riak
Arch        : x86_64
Version     : 1.3.1
Release     : 1.el6
Size        : 40 M
Repo        : installed
>From repo   : basho-products

The command I used to put:
s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d

The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==

EBUG: Sending request method_string='POST', uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length': '309', 'Authorization': 'AWS OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}

While the etag created by Amazon S3 is: &quot;70e1860be687d43c039873adef4280f2-3&quot;

DEBUG: Sending request method_string='POST', uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--', 
DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}

So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" (underscore) on Riak cs. 

Do you know the reason? What should we need to do to make it compatible with Amazon S3 SDK?

> -----Original Message-----
> From: John Burwell [mailto:jburwell@basho.com]
> Sent: Thursday, June 06, 2013 2:03 PM
> To: dev@cloudstack.apache.org
> Subject: Re: Object based Secondary storage.
> 
> Min,
> 
> Are you calculating the MD5 or letting the Amazon client do it?
> 
> Thanks,
> -John
> 
> On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
> 
> > Thanks Tom. Indeed I have a S3 question that need some advise from
> > some S3 experts. To support upload object > 5G, I have used
> > TransferManager.upload to upload object to S3, upload went fine and
> > object are successfully put to S3. However, later on when I am using
> > "s3cmd get <object key>" to retrieve this object, I always got this exception:
> >
> > "MD5 signatures do not match: computed=Y, received="X"
> >
> > It seems that Amazon S3 kept a different Md5 sum for the multi-part
> > uploaded object. We have been using Riak CS for our S3 testing. If I
> > changed to not using multi-part upload and directly invoking S3
> > putObject, I will not run into this issue. Do you have such experience
> before?
> >
> > -min
> >
> > On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> >
> >> Thanks Min. I've printed out the material and am reading new threads.
> >> Can't comment much yet until I understand things a bit more.
> >>
> >> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
> >> looking forward to playing with the object_store branch and testing
> >> it out.
> >>
> >> Tom.
> >>
> >> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
> >>> Welcome Tom. You can check out this FS
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
> >>> p+Obj
> >>> ec
> >>> t+Store+Plugin+Framework for secondary storage architectural work
> >>> t+Store+Plugin+done
> >>> in
> >>> object_store branch.You may also check out the following recent
> >>> threads regarding 3 major technical questions raised by community as
> >>> well as our answers and clarification.
> >>>
> >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> dev/201306.mbox/
> >>> %3C77
> >>> B3
> >>>
> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
> >>>
> >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> dev/201306.mbox/
> >>> %3CCD
> >>> D2
> >>> 2955.3DDDC%25min.chen%40citrix.com%3E
> >>>
> >>> http://mail-archives.apache.org/mod_mbox/cloudstack-
> dev/201306.mbox/
> >>> %3CCD
> >>> D2
> >>> 300D.3DE0C%25min.chen%40citrix.com%3E
> >>>
> >>>
> >>> That branch is mainly worked on by Edison and me, and we are at PST
> >>> timezone.
> >>>
> >>> Thanks
> >>> -min
> >> --
> >> Cloudian KK - http://www.cloudian.com/get-started.html
> >> Fancy 100TB of full featured S3 Storage?
> >> Checkout the Cloudian(r) Community Edition!
> >>
> >


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
Hi John,

	I didn't actually calculating the MD5 explicitly. I traced the code to
ServiceUtils.downloadObjectToFile method from amazon s3 sdk, my invocation
of S3Utils.getObject failed at the following code in ServiceUtils:

byte[] clientSideHash = null;
        byte[] serverSideHash = null;
        try {
            // Multipart Uploads don't have an MD5 calculated on the
service side
            if 
(ServiceUtils.isMultipartUploadETag(s3Object.getObjectMetadata().getETag())
 == false) {
                clientSideHash = Md5Utils.computeMD5Hash(new
FileInputStream(destinationFile));
                serverSideHash =
BinaryUtils.fromHex(s3Object.getObjectMetadata().getETag());
            }
        } catch (Exception e) {
            log.warn("Unable to calculate MD5 hash to validate download: "
+ e.getMessage(), e);
        }

        if (performIntegrityCheck && clientSideHash != null &&
serverSideHash != null && !Arrays.equals(clientSideHash, serverSideHash)) {
            throw new AmazonClientException("Unable to verify integrity of
data download.  " +
                    "Client calculated content hash didn't match hash
calculated by Amazon S3.  " +
                    "The data stored in '" +
destinationFile.getAbsolutePath() + "' may be corrupt.");
        }
	
Some web discussion mentioned that this is related to multi-part copy:
http://sourceforge.net/p/s3tools/discussion/618865/thread/50a00c18. But
the resolution there seems not working for me.

Any advise?

	Thanks
	-min




On 6/6/13 2:02 PM, "John Burwell" <jb...@basho.com> wrote:

>Min,
>
>Are you calculating the MD5 or letting the Amazon client do it?
>
>Thanks,
>-John
>
>On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:
>
>> Thanks Tom. Indeed I have a S3 question that need some advise from some
>>S3
>> experts. To support upload object > 5G, I have used
>>TransferManager.upload
>> to upload object to S3, upload went fine and object are successfully put
>> to S3. However, later on when I am using "s3cmd get <object key>" to
>> retrieve this object, I always got this exception:
>> 
>> "MD5 signatures do not match: computed=Y, received="X"
>> 
>> It seems that Amazon S3 kept a different Md5 sum for the multi-part
>> uploaded object. We have been using Riak CS for our S3 testing. If I
>> changed to not using multi-part upload and directly invoking S3
>>putObject,
>> I will not run into this issue. Do you have such experience before?
>> 
>> -min
>> 
>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
>> 
>>> Thanks Min. I've printed out the material and am reading new threads.
>>> Can't comment much yet until I understand things a bit more.
>>> 
>>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>>> looking forward to playing with the object_store branch and testing it
>>> out.
>>> 
>>> Tom.
>>> 
>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>> Welcome Tom. You can check out this FS
>>>> 
>>>> 
>>>>https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backup+O
>>>>bj
>>>> ec
>>>> t+Store+Plugin+Framework for secondary storage architectural work done
>>>> in
>>>> object_store branch.You may also check out the following recent
>>>>threads
>>>> regarding 3 major technical questions raised by community as well as
>>>>our
>>>> answers and clarification.
>>>> 
>>>> 
>>>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C
>>>>77
>>>> B3
>>>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>> 
>>>> 
>>>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C
>>>>CD
>>>> D2
>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>> 
>>>> 
>>>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C
>>>>CD
>>>> D2
>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>> 
>>>> 
>>>> That branch is mainly worked on by Edison and me, and we are at PST
>>>> timezone. 
>>>> 
>>>> Thanks
>>>> -min
>>> -- 
>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>> Fancy 100TB of full featured S3 Storage?
>>> Checkout the Cloudian® Community Edition!
>>> 
>> 
>


Re: Object based Secondary storage.

Posted by John Burwell <jb...@basho.com>.
Min,

Are you calculating the MD5 or letting the Amazon client do it?

Thanks,
-John

On Jun 6, 2013, at 4:54 PM, Min Chen <mi...@citrix.com> wrote:

> Thanks Tom. Indeed I have a S3 question that need some advise from some S3
> experts. To support upload object > 5G, I have used TransferManager.upload
> to upload object to S3, upload went fine and object are successfully put
> to S3. However, later on when I am using "s3cmd get <object key>" to
> retrieve this object, I always got this exception:
> 
> "MD5 signatures do not match: computed=Y, received="X"
> 
> It seems that Amazon S3 kept a different Md5 sum for the multi-part
> uploaded object. We have been using Riak CS for our S3 testing. If I
> changed to not using multi-part upload and directly invoking S3 putObject,
> I will not run into this issue. Do you have such experience before?
> 
> -min
> 
> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:
> 
>> Thanks Min. I've printed out the material and am reading new threads.
>> Can't comment much yet until I understand things a bit more.
>> 
>> Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>> looking forward to playing with the object_store branch and testing it
>> out.
>> 
>> Tom.
>> 
>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>> Welcome Tom. You can check out this FS
>>> 
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backup+Obj
>>> ec
>>> t+Store+Plugin+Framework for secondary storage architectural work done
>>> in
>>> object_store branch.You may also check out the following recent threads
>>> regarding 3 major technical questions raised by community as well as our
>>> answers and clarification.
>>> 
>>> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C77
>>> B3
>>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>> 
>>> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCD
>>> D2
>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>> 
>>> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCD
>>> D2
>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>> 
>>> 
>>> That branch is mainly worked on by Edison and me, and we are at PST
>>> timezone. 
>>> 
>>> Thanks
>>> -min
>> -- 
>> Cloudian KK - http://www.cloudian.com/get-started.html
>> Fancy 100TB of full featured S3 Storage?
>> Checkout the Cloudian® Community Edition!
>> 
> 


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
Thanks Tom. Indeed I have a S3 question that need some advise from some S3
experts. To support upload object > 5G, I have used TransferManager.upload
to upload object to S3, upload went fine and object are successfully put
to S3. However, later on when I am using "s3cmd get <object key>" to
retrieve this object, I always got this exception:

"MD5 signatures do not match: computed=Y, received="X"

It seems that Amazon S3 kept a different Md5 sum for the multi-part
uploaded object. We have been using Riak CS for our S3 testing. If I
changed to not using multi-part upload and directly invoking S3 putObject,
I will not run into this issue. Do you have such experience before?

-min

On 6/6/13 1:56 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:

>Thanks Min. I've printed out the material and am reading new threads.
>Can't comment much yet until I understand things a bit more.
>
>Meanwhile, feel free to hit me up with any S3 questions you have. I'm
>looking forward to playing with the object_store branch and testing it
>out.
>
>Tom.
>
>On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>> Welcome Tom. You can check out this FS
>> 
>>https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backup+Obj
>>ec
>> t+Store+Plugin+Framework for secondary storage architectural work done
>>in
>> object_store branch.You may also check out the following recent threads
>> regarding 3 major technical questions raised by community as well as our
>> answers and clarification.
>> 
>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C77
>>B3
>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>> 
>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCD
>>D2
>> 2955.3DDDC%25min.chen%40citrix.com%3E
>> 
>>http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCD
>>D2
>> 300D.3DE0C%25min.chen%40citrix.com%3E
>> 
>> 
>> That branch is mainly worked on by Edison and me, and we are at PST
>> timezone. 
>> 
>> Thanks
>> -min
>-- 
>Cloudian KK - http://www.cloudian.com/get-started.html
>Fancy 100TB of full featured S3 Storage?
>Checkout the Cloudian® Community Edition!
>


Re: Object based Secondary storage.

Posted by Thomas O'Dowd <tp...@cloudian.com>.
Thanks Min. I've printed out the material and am reading new threads.
Can't comment much yet until I understand things a bit more.

Meanwhile, feel free to hit me up with any S3 questions you have. I'm
looking forward to playing with the object_store branch and testing it
out.

Tom.

On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
> Welcome Tom. You can check out this FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backup+Objec
> t+Store+Plugin+Framework for secondary storage architectural work done in
> object_store branch.You may also check out the following recent threads
> regarding 3 major technical questions raised by community as well as our
> answers and clarification.
> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C77B3
> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCDD2
> 2955.3DDDC%25min.chen%40citrix.com%3E
> http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCDD2
> 300D.3DE0C%25min.chen%40citrix.com%3E
> 
> 
> That branch is mainly worked on by Edison and me, and we are at PST
> timezone. 
> 
> Thanks
> -min
-- 
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Re: Object based Secondary storage.

Posted by Min Chen <mi...@citrix.com>.
Welcome Tom. You can check out this FS
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backup+Objec
t+Store+Plugin+Framework for secondary storage architectural work done in
object_store branch.You may also check out the following recent threads
regarding 3 major technical questions raised by community as well as our
answers and clarification.
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3C77B3
37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCDD2
2955.3DDDC%25min.chen%40citrix.com%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201306.mbox/%3CCDD2
300D.3DE0C%25min.chen%40citrix.com%3E


That branch is mainly worked on by Edison and me, and we are at PST
timezone. 

Thanks
-min

On 6/5/13 2:31 AM, "Thomas O'Dowd" <tp...@cloudian.com> wrote:

>Hi all,
>
>I'm new here. I'm interested in Cloudstack Secondary storage using S3
>object stores. I checked out and built cloudstack today and found the
>object_store branch (not built it yet). I haven't done Java since 2004
>(mostly erlang/C++/python) so I'm rusty but I know the finer parts of
>S3 :-)
>
>Anyway - I'm thinking I can help in my spare time. Any pointers to the
>new object store secondary storage design are appreciated. Someone on
>IRC already pointed out the merge request mail archive which I've read.
>What timezone are the main folks working on in? I'm GMT+9.
>
>Tom.
>-- 
>Cloudian KK - http://www.cloudian.com/get-started.html
>Fancy 100TB of full featured S3 Storage?
>Checkout the Cloudian® Community Edition!
>