You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Zoran Rajic <zo...@ddn.com> on 2013/10/25 04:21:50 UTC
Q: Cloudstack S3 performance

Hello fellow Cloudstack users,

This is my first post to this mailing list, so please excuse me if I'm not
following the proper etiquette.

My name is Zoran, and I'm a developer working for a DDN company.  While
investigating the Cloudstack S3 and its performance, I encountered a bit
weird behavior of the Cloudstack S3 server, so I wanted to verify my
findings with you guys.

To test the S3 performance I used Python and BOTO libraries to create an
S3 client that is adding random content/name keys into the Cloudstack S3
and a single bucket.  To my surprise, the Cloudstack S3 buckets were
getting more and more unresponsive.  For example, at about 20'000 keys it
was taking about 10 seconds to "get a bucket" (BOTO ref
http://boto.s3.amazonaws.com/ref/s3.html#boto.s3.connection.S3Connection.ge
t_bucket, AWS ref 
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html).

Did someone else also notice this significant slow-downs, or is it perhaps
just my environment and possibly misconfiguration?


Not to leave it at this, I tried to locate the delay on the Cloudstack S3
server-side, and I may have found two potential issues with
SObjectDaoImpl.listBucketObjects()  (ref
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=awsapi/sr
c/com/cloud/bridge/persist/dao/SObjectDaoImpl.java;h=6d23757b8b57ded9443bfe
61aaa3742590b21c49;hb=master#l71):

1) the maxKeys parameter seems to be ignored, so all 20'000+ keys (ie.
full bucket content) was being inspected instead of normally just first
1'000 keys, which is a default maxKeys value

2) the way the object's data is extracted form the MySQL seems to be using
sub-queries instead of JOINs, so something similar to this:
>  objList = SQL("select * from SObject where SBucketID='xxx' ");
>  for (ObjectVO obj : objList) {
>     objItem = SQL("select * from SObject_Item where SObjectID='yyy' ");
>  }

Note that the data can be retrieved "in one go" and lot more efficiently
if one used JOIN on the database-side, ie.
>  select * from sobject so LEFT JOIN sobject_item si on
>so.ID=si.SObjectID where SBucketID='xxx';
However, I am not sure if the Cloudstack DAO and *VO-objects abstraction
supports database JOINs.

Can someone confirm if these are actual code issues?

Thank you in advance!


Best regards,
	Zoran