You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by ch...@zv.fraunhofer.de on 2016/04/29 16:40:36 UTC

S3 secondary storage performance considerations

Hi,

we plan to switch from a per-zone NFS secondary storage to a (region-wide) S3 secondary storage. Currently we have some concerns wrt the performance of this setup, though.

The throughput we achieve for e.g. a single file download (wget http://our.s3.storage/some-file) is approx. 100mbps. Due to the nature of an object store, this behavior is quite normale. High performance will only be achieved through multipart up-/download. This feature is exploited by most tools that natively “speak” the S3 protocol.

However, at the same time, we have a couple of “power-users”. Such users require data volumes up to 1 TB size frequently and also enable recurring daily snapshots of these volumes on a regular basis. Thus, we produce quite a lot of data (approx. 6-10TB per day) with snapshots.

Given that, we are interested in understanding the following:
- Is anybody successfully using S3 secondary storage with a similar or even higher amount of data? If so, is there anything we have to consider in particular?
- Is the Cloudstack synchronization process single- or multi-threaded? That is, if we produce 6-10TB per day data volume to be stored on the S3 object store it might take longer than one day to copy it to the S3, if the Cloudstack copies the data sequentially file by file.
- Does Cloudstack support multipart uploads? That is, like the aforementioned example, uploading a 1TB file to S3 will take forever if this is not supported.
- Is there any advise on the size of the secondary staging storage per zone, e.g. depending on the primary storage volume or the amount of VMs or something?

Just in case, we still use Cloudstack 4.5.1 with VMWare 5.5 as hypervisor.

Thanks in advance for any help.

Best regards,
Christian

Re: S3 secondary storage performance considerations

Posted by ch...@zv.fraunhofer.de.

Dear Sebastián,

thanks for the hint. However, as we are operating an object store in our data center and not relying on AWS, network traffic is not an issue. 
I’m more worried about the amount of data the ssvm has to push and pull out of the object store via the S3 interface in our case and if this setup will really scale. 

Best regards
Christian 


> On 01 May 2016, at 22:37, Sebastian Gomez <ti...@gmail.com> wrote:
> 
> Not just about any of your questions, but one advice:
> 
> If you are talking about using AWS S3 as backend storage, is highly
> recommendable that you consider the storage network traffic cost. I have
> faced some cases, where the prize of the network traffic of out-coming data
> (from AWS to Internet) is greater than the cost of the infrastructures
> itself. In those cases we recommended to place strategic servers on AWS to
> reduce the amount of out-coming network traffic.
> 
> 
> 
> Regards.
> 
> 
> 
> 
> Atentamente,
> Sebastián Gómez
> 
> On Fri, Apr 29, 2016 at 4:40 PM, <ch...@zv.fraunhofer.de>
> wrote:
> 
>> Hi,
>> 
>> we plan to switch from a per-zone NFS secondary storage to a (region-wide)
>> S3 secondary storage. Currently we have some concerns wrt the performance
>> of this setup, though.
>> 
>> The throughput we achieve for e.g. a single file download (wget
>> http://our.s3.storage/some-file) is approx. 100mbps. Due to the nature of
>> an object store, this behavior is quite normale. High performance will only
>> be achieved through multipart up-/download. This feature is exploited by
>> most tools that natively “speak” the S3 protocol.
>> 
>> However, at the same time, we have a couple of “power-users”. Such users
>> require data volumes up to 1 TB size frequently and also enable recurring
>> daily snapshots of these volumes on a regular basis. Thus, we produce quite
>> a lot of data (approx. 6-10TB per day) with snapshots.
>> 
>> Given that, we are interested in understanding the following:
>> - Is anybody successfully using S3 secondary storage with a similar or
>> even higher amount of data? If so, is there anything we have to consider in
>> particular?
>> - Is the Cloudstack synchronization process single- or multi-threaded?
>> That is, if we produce 6-10TB per day data volume to be stored on the S3
>> object store it might take longer than one day to copy it to the S3, if the
>> Cloudstack copies the data sequentially file by file.
>> - Does Cloudstack support multipart uploads? That is, like the
>> aforementioned example, uploading a 1TB file to S3 will take forever if
>> this is not supported.
>> - Is there any advise on the size of the secondary staging storage per
>> zone, e.g. depending on the primary storage volume or the amount of VMs or
>> something?
>> 
>> Just in case, we still use Cloudstack 4.5.1 with VMWare 5.5 as hypervisor.
>> 
>> Thanks in advance for any help.
>> 
>> Best regards,
>> Christian
>> 
>> 
>>

Re: S3 secondary storage performance considerations

Posted by Sebastian Gomez <ti...@gmail.com>.

Not just about any of your questions, but one advice:

If you are talking about using AWS S3 as backend storage, is highly
recommendable that you consider the storage network traffic cost. I have
faced some cases, where the prize of the network traffic of out-coming data
(from AWS to Internet) is greater than the cost of the infrastructures
itself. In those cases we recommended to place strategic servers on AWS to
reduce the amount of out-coming network traffic.



Regards.




Atentamente,
Sebastián Gómez

On Fri, Apr 29, 2016 at 4:40 PM, <ch...@zv.fraunhofer.de>
wrote:

> Hi,
>
> we plan to switch from a per-zone NFS secondary storage to a (region-wide)
> S3 secondary storage. Currently we have some concerns wrt the performance
> of this setup, though.
>
> The throughput we achieve for e.g. a single file download (wget
> http://our.s3.storage/some-file) is approx. 100mbps. Due to the nature of
> an object store, this behavior is quite normale. High performance will only
> be achieved through multipart up-/download. This feature is exploited by
> most tools that natively “speak” the S3 protocol.
>
> However, at the same time, we have a couple of “power-users”. Such users
> require data volumes up to 1 TB size frequently and also enable recurring
> daily snapshots of these volumes on a regular basis. Thus, we produce quite
> a lot of data (approx. 6-10TB per day) with snapshots.
>
> Given that, we are interested in understanding the following:
> - Is anybody successfully using S3 secondary storage with a similar or
> even higher amount of data? If so, is there anything we have to consider in
> particular?
> - Is the Cloudstack synchronization process single- or multi-threaded?
> That is, if we produce 6-10TB per day data volume to be stored on the S3
> object store it might take longer than one day to copy it to the S3, if the
> Cloudstack copies the data sequentially file by file.
> - Does Cloudstack support multipart uploads? That is, like the
> aforementioned example, uploading a 1TB file to S3 will take forever if
> this is not supported.
> - Is there any advise on the size of the secondary staging storage per
> zone, e.g. depending on the primary storage volume or the amount of VMs or
> something?
>
> Just in case, we still use Cloudstack 4.5.1 with VMWare 5.5 as hypervisor.
>
> Thanks in advance for any help.
>
> Best regards,
> Christian
>
>
>