You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2021/11/18 00:43:01 UTC

[GitHub] [cloudstack] alexandru-bagu opened a new issue #5697: Storage bandwidth is wasted during template uploads/imports

alexandru-bagu opened a new issue #5697:
URL: https://github.com/apache/cloudstack/issues/5697

At the moment when a template is being imported (via url or upload) there is a phase called "Installing template" or something similar. From what I noticed this phase calculates a hash and saves it to the database by reading the downloaded file over the storage network.
During this phase the template is not usable and I believe that should not be the case. I understand that the hash has a purpose and I am not saying it should be removed but I believe this should be an optional task that should be done in background.

In the following examples I do not consider disk speeds, just network speed/bandwidth.
For normal templates this is not necessarily noticeable. Consider this scenario (best case):
Template size: 4GB
Ingress bandwidth: 100 mb/s
Storage bandwidth: 1 gb/s
Download time required: 5.45 seconds
Installing template time required: ~0.6 seconds

Most templates (not ISOs) however are considerably larger, some could be even up to 500 GB (I do have a few templates that I have to import with very large sizes).
In such a case, installing template time required would be about 75 seconds (best case).

During this time (installing template time) the whole bandwidth available for the storage network (if it even is on a separate NIC) would be used up by this process resulting in bad performance for the cluster.

Ways to fix this would be:
1. either compute the hash as the transfer is happening.
2. make it optional (maybe even opt-in) and do it in background only (maybe even limit the bandwidth used for this)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [cloudstack] nvazquez commented on issue #5697: Storage bandwidth is wasted during template uploads/imports

Posted by GitBox <gi...@apache.org>.

nvazquez commented on issue #5697:
URL: https://github.com/apache/cloudstack/issues/5697#issuecomment-1029570281


   Thanks @alexandru-bagu - the installing template phase is also extracting the downloaded template (for compressed templates) once it has finished downloading it which takes time as well. The data exchanged during the time of installation between the management server and SSVM are commands and answers to simply check if the installation is finished. Unfortunately I don't think this task can be set as optional - what do you think @rohityadavcloud @DaanHoogland?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [cloudstack] DaanHoogland commented on issue #5697: Storage bandwidth is wasted during template uploads/imports

Posted by GitBox <gi...@apache.org>.

DaanHoogland commented on issue #5697:
URL: https://github.com/apache/cloudstack/issues/5697#issuecomment-1038758245


   > *For more context, we are currently looking for a way to import a template that has about 4 TB. Importing such a template alone would take a long time even with a 10GB connection. To have the system waste more bandwidth to compute a hash that is not even going to be validated any other time is not useful.
   
   makes sense. The checksum will be used on initial download if you supply an excepted checksum. The chacksum of the extracted file will later be used on store to store (e.g. could be cross zone) copying of the template.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [cloudstack] alexandru-bagu commented on issue #5697: Storage bandwidth is wasted during template uploads/imports

Posted by GitBox <gi...@apache.org>.

alexandru-bagu commented on issue #5697:
URL: https://github.com/apache/cloudstack/issues/5697#issuecomment-1035531550


   > The optimisation you mention could be done, if you wish But I would definately opt for opt-out (maybe to be reversersed by a global setting. It sounds strange to me that after downloading the network would still be fully occupied by the status query thaat checks the status of the install. That should be minimal.
   
   I doubt it's the status check that uses the network. My thinking was that if I use shared storage for templates which is required afaik then whenever the SSVM does an operation network bandwidth has to be used. Meaning
   1. when downloading the template traffic should look like this: [public net] -download-> [ssvm] -save-> [storage] (ssvm downloads from public net and writes to storage)
   2. when download is complete and template is being extracted traffic is probably this: [storage] -read-> [ssvm] -process-> [storage] (ssvm reads from storage and writes to storage)
   3. when hash is being computed traffic is probably this: [storage] -read-> [ssvm] and then when it finishes reading and computing the hash push it to cloudstack
   
   If hash is computed separate from the download then additional bandwidth is going to be used to read the file and compute it. 
   Additionally I hope that in case where the template is not compressed the file is just renamed and not copied into the proper name and the other one removed.
   
   *For more context, we are currently looking for a way to import a template that has about 4 TB. Importing such a template alone would take a long time even with a 10GB connection. To have the system waste more bandwidth to compute a hash that is not even going to be validated any other time is not useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [cloudstack] DaanHoogland commented on issue #5697: Storage bandwidth is wasted during template uploads/imports

Posted by GitBox <gi...@apache.org>.

DaanHoogland commented on issue #5697:
URL: https://github.com/apache/cloudstack/issues/5697#issuecomment-1029699976


   The optimisation you mention could be done, if you wish But I would definately opt for opt-out (maybe to be reversersed by a global setting.
   It sounds strange to me that after downloading the network would still be fully occupied by the status query thaat checks the status of the install. That should be minimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org