You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Andrei Mikhailovsky <an...@arhont.com> on 2014/03/20 00:59:46 UTC

ACS and KVM uses /tmp for volumes migration and templates

Hi guys, 

I was wondering if this is a bug? 

I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server. 

It also seems that the /tmp is temporarily used during a template creation . 

My setup: 

ACS 4.2.1 
Ubuntu 12.04 with KVM 
RBD + NFS for Primary storage 
NFS for Staging and Secondary storage 


Thanks 

Andrei 

Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Wido, 


Thanks, i will give it a try when I have a moment. 


Andrei 
----- Original Message -----

From: "Wido den Hollander" <wi...@widodh.nl> 
To: dev@cloudstack.apache.org 
Sent: Monday, 24 March, 2014 3:38:57 PM 
Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 

On 03/24/2014 03:22 PM, Andrei Mikhailovsky wrote: 
> 
> Do you think I can apply the patch manually to the 4.3 branch? I would love to try it with 4.3, but not too adventitious to upgrade my setup to 4.4 yet )) 
> 

Yes! I just pushed a commit to the master branch: 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=commit;h=9763faf85e3f54ac84d5ca1d5ad6e89c7fcc87ee 

To build 4.3 

$ git checkout 4.3 
$ git cherry-pick 9763faf85e3f54ac84d5ca1d5ad6e89c7fcc87ee 
$ dpkg-buildpackage 

Now you only have to update the cloudstack-agent package on the hypervisors. 

Wido 

> 
> Andrei 
> ----- Original Message ----- 
> 
> From: "Wido den Hollander" <wi...@widodh.nl> 
> To: dev@cloudstack.apache.org 
> Sent: Monday, 24 March, 2014 12:29:36 PM 
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
> 
> On 03/23/2014 08:01 PM, Andrei Mikhailovsky wrote: 
>> Wido, 
>> 
>> Could you please let me know when you've done this so I could try it out. Would it be a part of the 4.3 branch or 4.4? 
>> 
> 
> I'll do that. It will go into master which is 4.4 and I'm not sure if 
> this will be backported to 4.3.1 
> 
> Wido 
> 
>> Thanks 
>> ----- Original Message ----- 
>> 
>> From: "Wido den Hollander" <wi...@widodh.nl> 
>> To: dev@cloudstack.apache.org 
>> Sent: Sunday, 23 March, 2014 3:56:44 PM 
>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
>> 
>> 
>> 
>> On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote: 
>>> 
>>> Wido, 
>>> 
>>> 
>>> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release? 
>>> 
>> 
>> No, it hasn't been fixed yet. I have to admit, I forgot about this until 
>> you sent this e-mail to the list. 
>> 
>> I'll fix this in master later this week. 
>> 
>>> 
>>> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers. 
>>> 
>>> 
>> 
>> I fully agree. While writing this code I was aware of this. See my 
>> comments in the code: 
>> https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087 
>> 
>>> From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2 
>>> 
>> 
>> Correct, because I by-pass libvirt and Qemu at some places right now. 
>> 
>>> 
>>> Cheers, 
>>> 
>>> 
>>> Andrei 
>>> 
>>> 
>>> 
>>> 
>>> ----- Original Message ----- 
>>> 
>>> From: "Wido den Hollander" <wi...@widodh.nl> 
>>> To: dev@cloudstack.apache.org 
>>> Sent: Thursday, 20 March, 2014 9:40:29 AM 
>>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
>>> 
>>> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote: 
>>>> Hi guys, 
>>>> 
>>>> I was wondering if this is a bug? 
>>>> 
>>> 
>>> No, it's a "feature". 
>>> 
>>>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server. 
>>>> 
>>> 
>>> Correct. The problem was that RBD images know two formats. Format 1 
>>> (old/legacy) and format 2. 
>>> 
>>> In order to perform cloning images should be in RBD format 2. 
>>> 
>>> When running qemu-img convert with a RBD image as a destination qemu-img 
>>> will create a RBD image in format 1. 
>>> 
>>> That's due to this piece of code in block/rbd.c in Qemu: 
>>> 
>>> ret = rbd_create(io_ctx, name, bytes, &obj_order); 
>>> 
>>> rbd_create() creates images in format 1. To use format 2 you should use 
>>> rbd_create2() or rbd_create3(). 
>>> 
>>> With RBD format 1 we can't do snapshotting or cloning, which we require 
>>> in ACS. 
>>> 
>>> So I had to do a intermediate step where I first wrote the RAW image 
>>> somewhere and afterwards write it to RBD. 
>>> 
>>> After some discussion a config option has been added to Ceph: 
>>> 
>>> OPTION(rbd_default_format, OPT_INT, 1) 
>>> 
>>> This allows me to do this: 
>>> 
>>> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2 
>>> 
>>> This causes librbd/RBD to create a format 2 image and we can skip the 
>>> convert step to /tmp. 
>>> 
>>> This option is available since Ceph Dumpling 0.67.5 and was not 
>>> available when ACS 4.2 was written. 
>>> 
>>> I'm going to make changes in master which skip the step with /tmp. 
>>> 
>>> Technically this can be backported to 4.2, but then you would have to 
>>> run your own homebrew version of 4.2 
>>> 
>>>> It also seems that the /tmp is temporarily used during a template creation . 
>>>> 
>>> 
>>> Same story as above. 
>>> 
>>>> My setup: 
>>>> 
>>>> ACS 4.2.1 
>>>> Ubuntu 12.04 with KVM 
>>>> RBD + NFS for Primary storage 
>>>> NFS for Staging and Secondary storage 
>>>> 
>>>> 
>>>> Thanks 
>>>> 
>>>> Andrei 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 



Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Wido den Hollander <wi...@widodh.nl>.
On 03/24/2014 03:22 PM, Andrei Mikhailovsky wrote:
>
> Do you think I can apply the patch manually to the 4.3 branch? I would love to try it with 4.3, but not too adventitious to upgrade my setup to 4.4 yet ))
>

Yes! I just pushed a commit to the master branch: 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=commit;h=9763faf85e3f54ac84d5ca1d5ad6e89c7fcc87ee

To build 4.3

$ git checkout 4.3
$ git cherry-pick 9763faf85e3f54ac84d5ca1d5ad6e89c7fcc87ee
$ dpkg-buildpackage

Now you only have to update the cloudstack-agent package on the hypervisors.

Wido

>
> Andrei
> ----- Original Message -----
>
> From: "Wido den Hollander" <wi...@widodh.nl>
> To: dev@cloudstack.apache.org
> Sent: Monday, 24 March, 2014 12:29:36 PM
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>
> On 03/23/2014 08:01 PM, Andrei Mikhailovsky wrote:
>> Wido,
>>
>> Could you please let me know when you've done this so I could try it out. Would it be a part of the 4.3 branch or 4.4?
>>
>
> I'll do that. It will go into master which is 4.4 and I'm not sure if
> this will be backported to 4.3.1
>
> Wido
>
>> Thanks
>> ----- Original Message -----
>>
>> From: "Wido den Hollander" <wi...@widodh.nl>
>> To: dev@cloudstack.apache.org
>> Sent: Sunday, 23 March, 2014 3:56:44 PM
>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>>
>>
>>
>> On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote:
>>>
>>> Wido,
>>>
>>>
>>> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release?
>>>
>>
>> No, it hasn't been fixed yet. I have to admit, I forgot about this until
>> you sent this e-mail to the list.
>>
>> I'll fix this in master later this week.
>>
>>>
>>> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers.
>>>
>>>
>>
>> I fully agree. While writing this code I was aware of this. See my
>> comments in the code:
>> https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087
>>
>>>  From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2
>>>
>>
>> Correct, because I by-pass libvirt and Qemu at some places right now.
>>
>>>
>>> Cheers,
>>>
>>>
>>> Andrei
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>> From: "Wido den Hollander" <wi...@widodh.nl>
>>> To: dev@cloudstack.apache.org
>>> Sent: Thursday, 20 March, 2014 9:40:29 AM
>>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>>>
>>> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote:
>>>> Hi guys,
>>>>
>>>> I was wondering if this is a bug?
>>>>
>>>
>>> No, it's a "feature".
>>>
>>>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server.
>>>>
>>>
>>> Correct. The problem was that RBD images know two formats. Format 1
>>> (old/legacy) and format 2.
>>>
>>> In order to perform cloning images should be in RBD format 2.
>>>
>>> When running qemu-img convert with a RBD image as a destination qemu-img
>>> will create a RBD image in format 1.
>>>
>>> That's due to this piece of code in block/rbd.c in Qemu:
>>>
>>> ret = rbd_create(io_ctx, name, bytes, &obj_order);
>>>
>>> rbd_create() creates images in format 1. To use format 2 you should use
>>> rbd_create2() or rbd_create3().
>>>
>>> With RBD format 1 we can't do snapshotting or cloning, which we require
>>> in ACS.
>>>
>>> So I had to do a intermediate step where I first wrote the RAW image
>>> somewhere and afterwards write it to RBD.
>>>
>>> After some discussion a config option has been added to Ceph:
>>>
>>> OPTION(rbd_default_format, OPT_INT, 1)
>>>
>>> This allows me to do this:
>>>
>>> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2
>>>
>>> This causes librbd/RBD to create a format 2 image and we can skip the
>>> convert step to /tmp.
>>>
>>> This option is available since Ceph Dumpling 0.67.5 and was not
>>> available when ACS 4.2 was written.
>>>
>>> I'm going to make changes in master which skip the step with /tmp.
>>>
>>> Technically this can be backported to 4.2, but then you would have to
>>> run your own homebrew version of 4.2
>>>
>>>> It also seems that the /tmp is temporarily used during a template creation .
>>>>
>>>
>>> Same story as above.
>>>
>>>> My setup:
>>>>
>>>> ACS 4.2.1
>>>> Ubuntu 12.04 with KVM
>>>> RBD + NFS for Primary storage
>>>> NFS for Staging and Secondary storage
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Andrei
>>>>
>>>
>>>
>>>
>>
>>
>
>
>


Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Do you think I can apply the patch manually to the 4.3 branch? I would love to try it with 4.3, but not too adventitious to upgrade my setup to 4.4 yet )) 


Andrei 
----- Original Message -----

From: "Wido den Hollander" <wi...@widodh.nl> 
To: dev@cloudstack.apache.org 
Sent: Monday, 24 March, 2014 12:29:36 PM 
Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 

On 03/23/2014 08:01 PM, Andrei Mikhailovsky wrote: 
> Wido, 
> 
> Could you please let me know when you've done this so I could try it out. Would it be a part of the 4.3 branch or 4.4? 
> 

I'll do that. It will go into master which is 4.4 and I'm not sure if 
this will be backported to 4.3.1 

Wido 

> Thanks 
> ----- Original Message ----- 
> 
> From: "Wido den Hollander" <wi...@widodh.nl> 
> To: dev@cloudstack.apache.org 
> Sent: Sunday, 23 March, 2014 3:56:44 PM 
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
> 
> 
> 
> On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote: 
>> 
>> Wido, 
>> 
>> 
>> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release? 
>> 
> 
> No, it hasn't been fixed yet. I have to admit, I forgot about this until 
> you sent this e-mail to the list. 
> 
> I'll fix this in master later this week. 
> 
>> 
>> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers. 
>> 
>> 
> 
> I fully agree. While writing this code I was aware of this. See my 
> comments in the code: 
> https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087 
> 
>> From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2 
>> 
> 
> Correct, because I by-pass libvirt and Qemu at some places right now. 
> 
>> 
>> Cheers, 
>> 
>> 
>> Andrei 
>> 
>> 
>> 
>> 
>> ----- Original Message ----- 
>> 
>> From: "Wido den Hollander" <wi...@widodh.nl> 
>> To: dev@cloudstack.apache.org 
>> Sent: Thursday, 20 March, 2014 9:40:29 AM 
>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
>> 
>> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote: 
>>> Hi guys, 
>>> 
>>> I was wondering if this is a bug? 
>>> 
>> 
>> No, it's a "feature". 
>> 
>>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server. 
>>> 
>> 
>> Correct. The problem was that RBD images know two formats. Format 1 
>> (old/legacy) and format 2. 
>> 
>> In order to perform cloning images should be in RBD format 2. 
>> 
>> When running qemu-img convert with a RBD image as a destination qemu-img 
>> will create a RBD image in format 1. 
>> 
>> That's due to this piece of code in block/rbd.c in Qemu: 
>> 
>> ret = rbd_create(io_ctx, name, bytes, &obj_order); 
>> 
>> rbd_create() creates images in format 1. To use format 2 you should use 
>> rbd_create2() or rbd_create3(). 
>> 
>> With RBD format 1 we can't do snapshotting or cloning, which we require 
>> in ACS. 
>> 
>> So I had to do a intermediate step where I first wrote the RAW image 
>> somewhere and afterwards write it to RBD. 
>> 
>> After some discussion a config option has been added to Ceph: 
>> 
>> OPTION(rbd_default_format, OPT_INT, 1) 
>> 
>> This allows me to do this: 
>> 
>> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2 
>> 
>> This causes librbd/RBD to create a format 2 image and we can skip the 
>> convert step to /tmp. 
>> 
>> This option is available since Ceph Dumpling 0.67.5 and was not 
>> available when ACS 4.2 was written. 
>> 
>> I'm going to make changes in master which skip the step with /tmp. 
>> 
>> Technically this can be backported to 4.2, but then you would have to 
>> run your own homebrew version of 4.2 
>> 
>>> It also seems that the /tmp is temporarily used during a template creation . 
>>> 
>> 
>> Same story as above. 
>> 
>>> My setup: 
>>> 
>>> ACS 4.2.1 
>>> Ubuntu 12.04 with KVM 
>>> RBD + NFS for Primary storage 
>>> NFS for Staging and Secondary storage 
>>> 
>>> 
>>> Thanks 
>>> 
>>> Andrei 
>>> 
>> 
>> 
>> 
> 
> 



Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Wido den Hollander <wi...@widodh.nl>.
On 03/23/2014 08:01 PM, Andrei Mikhailovsky wrote:
> Wido,
>
> Could you please let me know when you've done this so I could try it out. Would it be a part of the 4.3 branch or 4.4?
>

I'll do that. It will go into master which is 4.4 and I'm not sure if 
this will be backported to 4.3.1

Wido

> Thanks
> ----- Original Message -----
>
> From: "Wido den Hollander" <wi...@widodh.nl>
> To: dev@cloudstack.apache.org
> Sent: Sunday, 23 March, 2014 3:56:44 PM
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>
>
>
> On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote:
>>
>> Wido,
>>
>>
>> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release?
>>
>
> No, it hasn't been fixed yet. I have to admit, I forgot about this until
> you sent this e-mail to the list.
>
> I'll fix this in master later this week.
>
>>
>> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers.
>>
>>
>
> I fully agree. While writing this code I was aware of this. See my
> comments in the code:
> https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087
>
>>  From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2
>>
>
> Correct, because I by-pass libvirt and Qemu at some places right now.
>
>>
>> Cheers,
>>
>>
>> Andrei
>>
>>
>>
>>
>> ----- Original Message -----
>>
>> From: "Wido den Hollander" <wi...@widodh.nl>
>> To: dev@cloudstack.apache.org
>> Sent: Thursday, 20 March, 2014 9:40:29 AM
>> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>>
>> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote:
>>> Hi guys,
>>>
>>> I was wondering if this is a bug?
>>>
>>
>> No, it's a "feature".
>>
>>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server.
>>>
>>
>> Correct. The problem was that RBD images know two formats. Format 1
>> (old/legacy) and format 2.
>>
>> In order to perform cloning images should be in RBD format 2.
>>
>> When running qemu-img convert with a RBD image as a destination qemu-img
>> will create a RBD image in format 1.
>>
>> That's due to this piece of code in block/rbd.c in Qemu:
>>
>> ret = rbd_create(io_ctx, name, bytes, &obj_order);
>>
>> rbd_create() creates images in format 1. To use format 2 you should use
>> rbd_create2() or rbd_create3().
>>
>> With RBD format 1 we can't do snapshotting or cloning, which we require
>> in ACS.
>>
>> So I had to do a intermediate step where I first wrote the RAW image
>> somewhere and afterwards write it to RBD.
>>
>> After some discussion a config option has been added to Ceph:
>>
>> OPTION(rbd_default_format, OPT_INT, 1)
>>
>> This allows me to do this:
>>
>> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2
>>
>> This causes librbd/RBD to create a format 2 image and we can skip the
>> convert step to /tmp.
>>
>> This option is available since Ceph Dumpling 0.67.5 and was not
>> available when ACS 4.2 was written.
>>
>> I'm going to make changes in master which skip the step with /tmp.
>>
>> Technically this can be backported to 4.2, but then you would have to
>> run your own homebrew version of 4.2
>>
>>> It also seems that the /tmp is temporarily used during a template creation .
>>>
>>
>> Same story as above.
>>
>>> My setup:
>>>
>>> ACS 4.2.1
>>> Ubuntu 12.04 with KVM
>>> RBD + NFS for Primary storage
>>> NFS for Staging and Secondary storage
>>>
>>>
>>> Thanks
>>>
>>> Andrei
>>>
>>
>>
>>
>
>


Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Wido, 

Could you please let me know when you've done this so I could try it out. Would it be a part of the 4.3 branch or 4.4? 

Thanks 
----- Original Message -----

From: "Wido den Hollander" <wi...@widodh.nl> 
To: dev@cloudstack.apache.org 
Sent: Sunday, 23 March, 2014 3:56:44 PM 
Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 



On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote: 
> 
> Wido, 
> 
> 
> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release? 
> 

No, it hasn't been fixed yet. I have to admit, I forgot about this until 
you sent this e-mail to the list. 

I'll fix this in master later this week. 

> 
> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers. 
> 
> 

I fully agree. While writing this code I was aware of this. See my 
comments in the code: 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087 

> From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2 
> 

Correct, because I by-pass libvirt and Qemu at some places right now. 

> 
> Cheers, 
> 
> 
> Andrei 
> 
> 
> 
> 
> ----- Original Message ----- 
> 
> From: "Wido den Hollander" <wi...@widodh.nl> 
> To: dev@cloudstack.apache.org 
> Sent: Thursday, 20 March, 2014 9:40:29 AM 
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 
> 
> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote: 
>> Hi guys, 
>> 
>> I was wondering if this is a bug? 
>> 
> 
> No, it's a "feature". 
> 
>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server. 
>> 
> 
> Correct. The problem was that RBD images know two formats. Format 1 
> (old/legacy) and format 2. 
> 
> In order to perform cloning images should be in RBD format 2. 
> 
> When running qemu-img convert with a RBD image as a destination qemu-img 
> will create a RBD image in format 1. 
> 
> That's due to this piece of code in block/rbd.c in Qemu: 
> 
> ret = rbd_create(io_ctx, name, bytes, &obj_order); 
> 
> rbd_create() creates images in format 1. To use format 2 you should use 
> rbd_create2() or rbd_create3(). 
> 
> With RBD format 1 we can't do snapshotting or cloning, which we require 
> in ACS. 
> 
> So I had to do a intermediate step where I first wrote the RAW image 
> somewhere and afterwards write it to RBD. 
> 
> After some discussion a config option has been added to Ceph: 
> 
> OPTION(rbd_default_format, OPT_INT, 1) 
> 
> This allows me to do this: 
> 
> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2 
> 
> This causes librbd/RBD to create a format 2 image and we can skip the 
> convert step to /tmp. 
> 
> This option is available since Ceph Dumpling 0.67.5 and was not 
> available when ACS 4.2 was written. 
> 
> I'm going to make changes in master which skip the step with /tmp. 
> 
> Technically this can be backported to 4.2, but then you would have to 
> run your own homebrew version of 4.2 
> 
>> It also seems that the /tmp is temporarily used during a template creation . 
>> 
> 
> Same story as above. 
> 
>> My setup: 
>> 
>> ACS 4.2.1 
>> Ubuntu 12.04 with KVM 
>> RBD + NFS for Primary storage 
>> NFS for Staging and Secondary storage 
>> 
>> 
>> Thanks 
>> 
>> Andrei 
>> 
> 
> 
> 


Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Wido den Hollander <wi...@widodh.nl>.

On 03/21/2014 02:23 PM, Andrei Mikhailovsky wrote:
>
> Wido,
>
>
> i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release?
>

No, it hasn't been fixed yet. I have to admit, I forgot about this until 
you sent this e-mail to the list.

I'll fix this in master later this week.

>
> I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers.
>
>

I fully agree. While writing this code I was aware of this. See my 
comments in the code: 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java;h=5de8bd26ae201187f5db5fd16b7e3ca157cab53a;hb=master#l1087

>  From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2
>

Correct, because I by-pass libvirt and Qemu at some places right now.

>
> Cheers,
>
>
> Andrei
>
>
>
>
> ----- Original Message -----
>
> From: "Wido den Hollander" <wi...@widodh.nl>
> To: dev@cloudstack.apache.org
> Sent: Thursday, 20 March, 2014 9:40:29 AM
> Subject: Re: ACS and KVM uses /tmp for volumes migration and templates
>
> On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote:
>> Hi guys,
>>
>> I was wondering if this is a bug?
>>
>
> No, it's a "feature".
>
>> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server.
>>
>
> Correct. The problem was that RBD images know two formats. Format 1
> (old/legacy) and format 2.
>
> In order to perform cloning images should be in RBD format 2.
>
> When running qemu-img convert with a RBD image as a destination qemu-img
> will create a RBD image in format 1.
>
> That's due to this piece of code in block/rbd.c in Qemu:
>
> ret = rbd_create(io_ctx, name, bytes, &obj_order);
>
> rbd_create() creates images in format 1. To use format 2 you should use
> rbd_create2() or rbd_create3().
>
> With RBD format 1 we can't do snapshotting or cloning, which we require
> in ACS.
>
> So I had to do a intermediate step where I first wrote the RAW image
> somewhere and afterwards write it to RBD.
>
> After some discussion a config option has been added to Ceph:
>
> OPTION(rbd_default_format, OPT_INT, 1)
>
> This allows me to do this:
>
> qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2
>
> This causes librbd/RBD to create a format 2 image and we can skip the
> convert step to /tmp.
>
> This option is available since Ceph Dumpling 0.67.5 and was not
> available when ACS 4.2 was written.
>
> I'm going to make changes in master which skip the step with /tmp.
>
> Technically this can be backported to 4.2, but then you would have to
> run your own homebrew version of 4.2
>
>> It also seems that the /tmp is temporarily used during a template creation .
>>
>
> Same story as above.
>
>> My setup:
>>
>> ACS 4.2.1
>> Ubuntu 12.04 with KVM
>> RBD + NFS for Primary storage
>> NFS for Staging and Secondary storage
>>
>>
>> Thanks
>>
>> Andrei
>>
>
>
>

Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Wido, 


i would be happy to try the custom ACS build unless 4.3 comes out soon. It has been overdue for sometime now )). Has this feature been addressed in the 4.3 release? 


I can leave with this feature for the time being, but i do see a longer term issue when my volumes become large as i've only got about 100gb free space on my host servers. 


>From what i can tell by looking at the rbd ls -l info all of my volumes are done in Format 2 


Cheers, 


Andrei 




----- Original Message -----

From: "Wido den Hollander" <wi...@widodh.nl> 
To: dev@cloudstack.apache.org 
Sent: Thursday, 20 March, 2014 9:40:29 AM 
Subject: Re: ACS and KVM uses /tmp for volumes migration and templates 

On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote: 
> Hi guys, 
> 
> I was wondering if this is a bug? 
> 

No, it's a "feature". 

> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server. 
> 

Correct. The problem was that RBD images know two formats. Format 1 
(old/legacy) and format 2. 

In order to perform cloning images should be in RBD format 2. 

When running qemu-img convert with a RBD image as a destination qemu-img 
will create a RBD image in format 1. 

That's due to this piece of code in block/rbd.c in Qemu: 

ret = rbd_create(io_ctx, name, bytes, &obj_order); 

rbd_create() creates images in format 1. To use format 2 you should use 
rbd_create2() or rbd_create3(). 

With RBD format 1 we can't do snapshotting or cloning, which we require 
in ACS. 

So I had to do a intermediate step where I first wrote the RAW image 
somewhere and afterwards write it to RBD. 

After some discussion a config option has been added to Ceph: 

OPTION(rbd_default_format, OPT_INT, 1) 

This allows me to do this: 

qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2 

This causes librbd/RBD to create a format 2 image and we can skip the 
convert step to /tmp. 

This option is available since Ceph Dumpling 0.67.5 and was not 
available when ACS 4.2 was written. 

I'm going to make changes in master which skip the step with /tmp. 

Technically this can be backported to 4.2, but then you would have to 
run your own homebrew version of 4.2 

> It also seems that the /tmp is temporarily used during a template creation . 
> 

Same story as above. 

> My setup: 
> 
> ACS 4.2.1 
> Ubuntu 12.04 with KVM 
> RBD + NFS for Primary storage 
> NFS for Staging and Secondary storage 
> 
> 
> Thanks 
> 
> Andrei 
> 



Re: ACS and KVM uses /tmp for volumes migration and templates

Posted by Wido den Hollander <wi...@widodh.nl>.
On 03/20/2014 12:59 AM, Andrei Mikhailovsky wrote:
> Hi guys,
>
> I was wondering if this is a bug?
>

No, it's a "feature".

> I've noticed that during volume migration from NFS to RBD primary storage the volume image is first copied to /tmp and only then to the RBD storage. This seems silly to me as one would expect a typical volume to be larger than the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for performance reasons. Thus, a typical host server will have far smaller /tmp folder than the size of an average volume. As a result, volume migration would break after filling the /tmp and could probably cause a bunch of issue for the KVM host itself as well as any vms running on the server.
>

Correct. The problem was that RBD images know two formats. Format 1 
(old/legacy) and format 2.

In order to perform cloning images should be in RBD format 2.

When running qemu-img convert with a RBD image as a destination qemu-img 
will create a RBD image in format 1.

That's due to this piece of code in block/rbd.c in Qemu:

     ret = rbd_create(io_ctx, name, bytes, &obj_order);

rbd_create() creates images in format 1. To use format 2 you should use 
rbd_create2() or rbd_create3().

With RBD format 1 we can't do snapshotting or cloning, which we require 
in ACS.

So I had to do a intermediate step where I first wrote the RAW image 
somewhere and afterwards write it to RBD.

After some discussion a config option has been added to Ceph:

OPTION(rbd_default_format, OPT_INT, 1)

This allows me to do this:

qemu-img convert .. -O raw .. rbd:rbd/myimage:rbd_default_format=2

This causes librbd/RBD to create a format 2 image and we can skip the 
convert step to /tmp.

This option is available since Ceph Dumpling 0.67.5 and was not 
available when ACS 4.2 was written.

I'm going to make changes in master which skip the step with /tmp.

Technically this can be backported to 4.2, but then you would have to 
run your own homebrew version of 4.2

> It also seems that the /tmp is temporarily used during a template creation .
>

Same story as above.

> My setup:
>
> ACS 4.2.1
> Ubuntu 12.04 with KVM
> RBD + NFS for Primary storage
> NFS for Staging and Secondary storage
>
>
> Thanks
>
> Andrei
>


RE: ACS and KVM uses /tmp for volumes migration and templates

Posted by Edison Su <Ed...@citrix.com>.
Which version of CloudStack are you using? Seems in 4.2, Wido enhanced RBD a lot, qemu-img itself can copy volume from NFS to RBD without temporary copying to /tmp folder. 

> -----Original Message-----
> From: Andrei Mikhailovsky [mailto:andrei@arhont.com]
> Sent: Wednesday, March 19, 2014 5:00 PM
> To: dev@cloudstack.apache.org
> Subject: ACS and KVM uses /tmp for volumes migration and templates
> 
> Hi guys,
> 
> I was wondering if this is a bug?
> 
> I've noticed that during volume migration from NFS to RBD primary storage
> the volume image is first copied to /tmp and only then to the RBD storage.
> This seems silly to me as one would expect a typical volume to be larger than
> the host's hard disk. Also, it is a common practice to use tmpfs as /tmp for
> performance reasons. Thus, a typical host server will have far smaller /tmp
> folder than the size of an average volume. As a result, volume migration
> would break after filling the /tmp and could probably cause a bunch of issue
> for the KVM host itself as well as any vms running on the server.
> 
> It also seems that the /tmp is temporarily used during a template creation .
> 
> My setup:
> 
> ACS 4.2.1
> Ubuntu 12.04 with KVM
> RBD + NFS for Primary storage
> NFS for Staging and Secondary storage
> 
> 
> Thanks
> 
> Andrei