You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Wido den Hollander <wi...@widodh.nl> on 2013/06/07 16:15:00 UTC

Orphaned libvirt storage pools

Hi,

So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there 
are some related issues:
* CLOUDSTACK-2729
* CLOUDSTACK-2780

I restarted my Agent and the issue described in 2893 went away, but I'm 
wondering how that happened.

Anyway, after going further I found that I have some "orphaned" storage 
pools, with that I mean, they are mounted and in use, but not defined 
nor active in libvirt:

root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort -n|uniq
eb3cd8fd-a462-35b9-882a-f4b9f2f4a84c
f84e51ab-d203-3114-b581-247b81b7d2c1
fd968b03-bd11-3179-a2b3-73def7c66c68
7ceb73e5-5ab1-3862-ad6e-52cb986aff0d
7dc0149e-0281-3353-91eb-4589ef2b1ec1
8e005344-6a65-3802-ab36-31befc95abf3
88ddd8f5-e6c7-3f3d-bef2-eea8f33aa593
765e63d7-e9f9-3203-bf4f-e55f83fe9177
1287a27d-0383-3f5a-84aa-61211621d451
98622150-41b2-3ba3-9c9c-09e3b6a2da03
root@n02:~#

Looking at libvirt:
root@n02:~# virsh pool-list
Name                 State      Autostart
-----------------------------------------
52801816-fe44-3a2b-a147-bb768eeea295 active     no
7ceb73e5-5ab1-3862-ad6e-52cb986aff0d active     no
88ddd8f5-e6c7-3f3d-bef2-eea8f33aa593 active     no
a83d1100-4ffa-432a-8467-4dc266c4b0c8 active     no
fd968b03-bd11-3179-a2b3-73def7c66c68 active     no

root@n02:~#

What happens here is that the mountpoints are in use (ISO attached to 
Instance) but there is no storage pool in libvirt.

This means that when you try to deploy a second VM with the same ISO 
libvirt will error out since the Agent will try to create and start a 
new storage pool which will fail since the mountpoint is already in use.

The remedy would be to take the hypervisor into maintainence, reboot int 
completely and migrate Instances to it again.

In libvirt there is no way to start a NFS storage pool without libvirt 
mounting it.

Any suggestions on how we can work around this code wise?

For my issue I'm writing a patch which adds some more debug lines to 
show what the Agent is doing, but it's kind of weird that we got into 
this "disconnected" state.

Wido

Re: Orphaned libvirt storage pools

Posted by Marcus Sorensen <sh...@gmail.com>.

I had seen something similar related to the KVM HA monitor (it would
re-mount the pools outside of libvirt after they were removed), but
anything using getStoragePoolByURI to register a pool shouldn't be
added to the KVMHA monitor anymore. That HA monitor script is the only
way I know of that cloudstack mounts NFS outside of libvirt, so it
seems that the issue is in removing the mountpoint while it is in use.
 Libvirt will remove it from the definition, even if it can't be
unmounted, so perhaps there's an issue in verifying that the
mountpoint isn't in use before trying to delete the storage pool.

I am assuming when you say 'in use' that it means that the ISO is
connected to a VM. However, this could happen for any number of
reasons... say an admin is looking in the directory right when
cloudstack wants to delete the storage pool from libvirt.

On Fri, Jun 7, 2013 at 8:30 AM, Marcus Sorensen <sh...@gmail.com> wrote:
> Does this only happen with isos?
>
> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:
>>
>> Hi,
>>
>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there are
>> some related issues:
>> * CLOUDSTACK-2729
>> * CLOUDSTACK-2780
>>
>> I restarted my Agent and the issue described in 2893 went away, but I'm
>> wondering how that happened.
>>
>> Anyway, after going further I found that I have some "orphaned" storage
>> pools, with that I mean, they are mounted and in use, but not defined nor
>> active in libvirt:
>>
>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
>> -n|uniq
>> eb3cd8fd-a462-35b9-882a-f4b9f2f4a84c
>> f84e51ab-d203-3114-b581-247b81b7d2c1
>> fd968b03-bd11-3179-a2b3-73def7c66c68
>> 7ceb73e5-5ab1-3862-ad6e-52cb986aff0d
>> 7dc0149e-0281-3353-91eb-4589ef2b1ec1
>> 8e005344-6a65-3802-ab36-31befc95abf3
>> 88ddd8f5-e6c7-3f3d-bef2-eea8f33aa593
>> 765e63d7-e9f9-3203-bf4f-e55f83fe9177
>> 1287a27d-0383-3f5a-84aa-61211621d451
>> 98622150-41b2-3ba3-9c9c-09e3b6a2da03
>> root@n02:~#
>>
>> Looking at libvirt:
>> root@n02:~# virsh pool-list
>> Name                 State      Autostart
>> -----------------------------------------
>> 52801816-fe44-3a2b-a147-bb768eeea295 active     no
>> 7ceb73e5-5ab1-3862-ad6e-52cb986aff0d active     no
>> 88ddd8f5-e6c7-3f3d-bef2-eea8f33aa593 active     no
>> a83d1100-4ffa-432a-8467-4dc266c4b0c8 active     no
>> fd968b03-bd11-3179-a2b3-73def7c66c68 active     no
>>
>> root@n02:~#
>>
>> What happens here is that the mountpoints are in use (ISO attached to
>> Instance) but there is no storage pool in libvirt.
>>
>> This means that when you try to deploy a second VM with the same ISO
>> libvirt will error out since the Agent will try to create and start a new
>> storage pool which will fail since the mountpoint is already in use.
>>
>> The remedy would be to take the hypervisor into maintainence, reboot int
>> completely and migrate Instances to it again.
>>
>> In libvirt there is no way to start a NFS storage pool without libvirt
>> mounting it.
>>
>> Any suggestions on how we can work around this code wise?
>>
>> For my issue I'm writing a patch which adds some more debug lines to show
>> what the Agent is doing, but it's kind of weird that we got into this
>> "disconnected" state.
>>
>> Wido

Re: Orphaned libvirt storage pools

Posted by Wido den Hollander <wi...@widodh.nl>.

Hi Wei,

This was with both 0.9.8 as with 1.0.2

Haven't been able to dig into this deeper yet.

Wido

On 06/12/2013 06:26 PM, Wei ZHOU wrote:
> Wido,
>
> Could you tell me the libvirt version?
> For our platform with this issue, the libvirt version is 0.9.13
>
> -Wei
>
>
> 2013/6/7 Marcus Sorensen <sh...@gmail.com>
>
>> There is already quite a bit of logging around this stuff, for example:
>>
>>                  s_logger.error("deleteStoragePool removed pool from
>> libvirt, but libvirt had trouble"
>>                                 + "unmounting the pool. Trying umount
>> location " + targetPath
>>                                 + "again in a few seconds");
>>
>> And if it gets an error from libvirt during create stating that the
>> mountpoint is in use, agent attempts to unmount before remounting. Of
>> course this would fail if it is in use.
>>
>>              // if error is that pool is mounted, try to handle it
>>              if (e.toString().contains("already mounted")) {
>>                  s_logger.error("Attempting to unmount old mount
>> libvirt is unaware of at "+targetPath);
>>                  String result = Script.runSimpleBashScript("umount " +
>> targetPath );
>>                  if (result == null) {
>>                      s_logger.error("Succeeded in unmounting " +
>> targetPath);
>>                      try {
>>                          sp = conn.storagePoolCreateXML(spd.toString(), 0);
>>                          s_logger.error("Succeeded in redefining storage");
>>                          return sp;
>>                      } catch (LibvirtException l) {
>>                          s_logger.error("Target was already mounted,
>> unmounted it but failed to redefine storage:" + l);
>>                      }
>>                  } else {
>>                      s_logger.error("Failed in unmounting and
>> redefining storage");
>>                  }
>>              }
>>
>>
>> Do you think it was related to the upgrade process itself (e.g. maybe
>> the storage pools didn't carry across the libvirt upgrade)? Can you
>> duplicate outside of the upgrade?
>>
>> On Fri, Jun 7, 2013 at 8:43 AM, Wido den Hollander <wi...@widodh.nl> wrote:
>>> Hi,
>>>
>>>
>>> On 06/07/2013 04:30 PM, Marcus Sorensen wrote:
>>>>
>>>> Does this only happen with isos?
>>>
>>>
>>> Yes, it does.
>>>
>>> My work-around for now was to locate all the Instances who had these ISOs
>>> attached and detach them from all (~100 instances..)
>>>
>>> Then I manually unmounted all the mountpoints under /mnt so that they
>> can be
>>> re-used again.
>>>
>>> This cluster was upgraded to 4.1 from 4.0 with libvirt 1.0.2 (coming from
>>> 0.9.8).
>>>
>>> Somehow libvirt forgot about these storage pools.
>>>
>>> Wido
>>>
>>>> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there
>> are
>>>>> some related issues:
>>>>> * CLOUDSTACK-2729
>>>>> * CLOUDSTACK-2780
>>>>>
>>>>> I restarted my Agent and the issue described in 2893 went away, but I'm
>>>>> wondering how that happened.
>>>>>
>>>>> Anyway, after going further I found that I have some "orphaned" storage
>>>>> pools, with that I mean, they are mounted and in use, but not defined
>> nor
>>>>> active in libvirt:
>>>>>
>>>>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
>>>>> -n|uniq
>>>>> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
>>>>> f84e51ab-d203-3114-b581-**247b81b7d2c1
>>>>> fd968b03-bd11-3179-a2b3-**73def7c66c68
>>>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
>>>>> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
>>>>> 8e005344-6a65-3802-ab36-**31befc95abf3
>>>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
>>>>> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
>>>>> 1287a27d-0383-3f5a-84aa-**61211621d451
>>>>> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
>>>>>
>>>>> root@n02:~#
>>>>>
>>>>> Looking at libvirt:
>>>>> root@n02:~# virsh pool-list
>>>>> Name                 State      Autostart
>>>>> ------------------------------**-----------
>>>>> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
>>>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
>>>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
>>>>> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
>>>>> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
>>>>>
>>>>>
>>>>> root@n02:~#
>>>>>
>>>>> What happens here is that the mountpoints are in use (ISO attached to
>>>>> Instance) but there is no storage pool in libvirt.
>>>>>
>>>>> This means that when you try to deploy a second VM with the same ISO
>>>>> libvirt will error out since the Agent will try to create and start a
>> new
>>>>> storage pool which will fail since the mountpoint is already in use.
>>>>>
>>>>> The remedy would be to take the hypervisor into maintainence, reboot
>> int
>>>>> completely and migrate Instances to it again.
>>>>>
>>>>> In libvirt there is no way to start a NFS storage pool without libvirt
>>>>> mounting it.
>>>>>
>>>>> Any suggestions on how we can work around this code wise?
>>>>>
>>>>> For my issue I'm writing a patch which adds some more debug lines to
>> show
>>>>> what the Agent is doing, but it's kind of weird that we got into this
>>>>> "disconnected" state.
>>>>>
>>>>> Wido
>>>>>
>>>>
>>>
>>
>

Re: Orphaned libvirt storage pools

Posted by Wei ZHOU <us...@gmail.com>.

Wido,

Could you tell me the libvirt version?
For our platform with this issue, the libvirt version is 0.9.13

-Wei


2013/6/7 Marcus Sorensen <sh...@gmail.com>

> There is already quite a bit of logging around this stuff, for example:
>
>                 s_logger.error("deleteStoragePool removed pool from
> libvirt, but libvirt had trouble"
>                                + "unmounting the pool. Trying umount
> location " + targetPath
>                                + "again in a few seconds");
>
> And if it gets an error from libvirt during create stating that the
> mountpoint is in use, agent attempts to unmount before remounting. Of
> course this would fail if it is in use.
>
>             // if error is that pool is mounted, try to handle it
>             if (e.toString().contains("already mounted")) {
>                 s_logger.error("Attempting to unmount old mount
> libvirt is unaware of at "+targetPath);
>                 String result = Script.runSimpleBashScript("umount " +
> targetPath );
>                 if (result == null) {
>                     s_logger.error("Succeeded in unmounting " +
> targetPath);
>                     try {
>                         sp = conn.storagePoolCreateXML(spd.toString(), 0);
>                         s_logger.error("Succeeded in redefining storage");
>                         return sp;
>                     } catch (LibvirtException l) {
>                         s_logger.error("Target was already mounted,
> unmounted it but failed to redefine storage:" + l);
>                     }
>                 } else {
>                     s_logger.error("Failed in unmounting and
> redefining storage");
>                 }
>             }
>
>
> Do you think it was related to the upgrade process itself (e.g. maybe
> the storage pools didn't carry across the libvirt upgrade)? Can you
> duplicate outside of the upgrade?
>
> On Fri, Jun 7, 2013 at 8:43 AM, Wido den Hollander <wi...@widodh.nl> wrote:
> > Hi,
> >
> >
> > On 06/07/2013 04:30 PM, Marcus Sorensen wrote:
> >>
> >> Does this only happen with isos?
> >
> >
> > Yes, it does.
> >
> > My work-around for now was to locate all the Instances who had these ISOs
> > attached and detach them from all (~100 instances..)
> >
> > Then I manually unmounted all the mountpoints under /mnt so that they
> can be
> > re-used again.
> >
> > This cluster was upgraded to 4.1 from 4.0 with libvirt 1.0.2 (coming from
> > 0.9.8).
> >
> > Somehow libvirt forgot about these storage pools.
> >
> > Wido
> >
> >> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:
> >>
> >>> Hi,
> >>>
> >>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there
> are
> >>> some related issues:
> >>> * CLOUDSTACK-2729
> >>> * CLOUDSTACK-2780
> >>>
> >>> I restarted my Agent and the issue described in 2893 went away, but I'm
> >>> wondering how that happened.
> >>>
> >>> Anyway, after going further I found that I have some "orphaned" storage
> >>> pools, with that I mean, they are mounted and in use, but not defined
> nor
> >>> active in libvirt:
> >>>
> >>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
> >>> -n|uniq
> >>> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
> >>> f84e51ab-d203-3114-b581-**247b81b7d2c1
> >>> fd968b03-bd11-3179-a2b3-**73def7c66c68
> >>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
> >>> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
> >>> 8e005344-6a65-3802-ab36-**31befc95abf3
> >>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
> >>> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
> >>> 1287a27d-0383-3f5a-84aa-**61211621d451
> >>> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
> >>>
> >>> root@n02:~#
> >>>
> >>> Looking at libvirt:
> >>> root@n02:~# virsh pool-list
> >>> Name                 State      Autostart
> >>> ------------------------------**-----------
> >>> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
> >>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
> >>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
> >>> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
> >>> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
> >>>
> >>>
> >>> root@n02:~#
> >>>
> >>> What happens here is that the mountpoints are in use (ISO attached to
> >>> Instance) but there is no storage pool in libvirt.
> >>>
> >>> This means that when you try to deploy a second VM with the same ISO
> >>> libvirt will error out since the Agent will try to create and start a
> new
> >>> storage pool which will fail since the mountpoint is already in use.
> >>>
> >>> The remedy would be to take the hypervisor into maintainence, reboot
> int
> >>> completely and migrate Instances to it again.
> >>>
> >>> In libvirt there is no way to start a NFS storage pool without libvirt
> >>> mounting it.
> >>>
> >>> Any suggestions on how we can work around this code wise?
> >>>
> >>> For my issue I'm writing a patch which adds some more debug lines to
> show
> >>> what the Agent is doing, but it's kind of weird that we got into this
> >>> "disconnected" state.
> >>>
> >>> Wido
> >>>
> >>
> >
>

Re: Orphaned libvirt storage pools

Posted by Marcus Sorensen <sh...@gmail.com>.

There is already quite a bit of logging around this stuff, for example:

                s_logger.error("deleteStoragePool removed pool from
libvirt, but libvirt had trouble"
                               + "unmounting the pool. Trying umount
location " + targetPath
                               + "again in a few seconds");

And if it gets an error from libvirt during create stating that the
mountpoint is in use, agent attempts to unmount before remounting. Of
course this would fail if it is in use.

            // if error is that pool is mounted, try to handle it
            if (e.toString().contains("already mounted")) {
                s_logger.error("Attempting to unmount old mount
libvirt is unaware of at "+targetPath);
                String result = Script.runSimpleBashScript("umount " +
targetPath );
                if (result == null) {
                    s_logger.error("Succeeded in unmounting " + targetPath);
                    try {
                        sp = conn.storagePoolCreateXML(spd.toString(), 0);
                        s_logger.error("Succeeded in redefining storage");
                        return sp;
                    } catch (LibvirtException l) {
                        s_logger.error("Target was already mounted,
unmounted it but failed to redefine storage:" + l);
                    }
                } else {
                    s_logger.error("Failed in unmounting and
redefining storage");
                }
            }


Do you think it was related to the upgrade process itself (e.g. maybe
the storage pools didn't carry across the libvirt upgrade)? Can you
duplicate outside of the upgrade?

On Fri, Jun 7, 2013 at 8:43 AM, Wido den Hollander <wi...@widodh.nl> wrote:
> Hi,
>
>
> On 06/07/2013 04:30 PM, Marcus Sorensen wrote:
>>
>> Does this only happen with isos?
>
>
> Yes, it does.
>
> My work-around for now was to locate all the Instances who had these ISOs
> attached and detach them from all (~100 instances..)
>
> Then I manually unmounted all the mountpoints under /mnt so that they can be
> re-used again.
>
> This cluster was upgraded to 4.1 from 4.0 with libvirt 1.0.2 (coming from
> 0.9.8).
>
> Somehow libvirt forgot about these storage pools.
>
> Wido
>
>> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:
>>
>>> Hi,
>>>
>>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there are
>>> some related issues:
>>> * CLOUDSTACK-2729
>>> * CLOUDSTACK-2780
>>>
>>> I restarted my Agent and the issue described in 2893 went away, but I'm
>>> wondering how that happened.
>>>
>>> Anyway, after going further I found that I have some "orphaned" storage
>>> pools, with that I mean, they are mounted and in use, but not defined nor
>>> active in libvirt:
>>>
>>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
>>> -n|uniq
>>> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
>>> f84e51ab-d203-3114-b581-**247b81b7d2c1
>>> fd968b03-bd11-3179-a2b3-**73def7c66c68
>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
>>> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
>>> 8e005344-6a65-3802-ab36-**31befc95abf3
>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
>>> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
>>> 1287a27d-0383-3f5a-84aa-**61211621d451
>>> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
>>>
>>> root@n02:~#
>>>
>>> Looking at libvirt:
>>> root@n02:~# virsh pool-list
>>> Name                 State      Autostart
>>> ------------------------------**-----------
>>> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
>>> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
>>> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
>>>
>>>
>>> root@n02:~#
>>>
>>> What happens here is that the mountpoints are in use (ISO attached to
>>> Instance) but there is no storage pool in libvirt.
>>>
>>> This means that when you try to deploy a second VM with the same ISO
>>> libvirt will error out since the Agent will try to create and start a new
>>> storage pool which will fail since the mountpoint is already in use.
>>>
>>> The remedy would be to take the hypervisor into maintainence, reboot int
>>> completely and migrate Instances to it again.
>>>
>>> In libvirt there is no way to start a NFS storage pool without libvirt
>>> mounting it.
>>>
>>> Any suggestions on how we can work around this code wise?
>>>
>>> For my issue I'm writing a patch which adds some more debug lines to show
>>> what the Agent is doing, but it's kind of weird that we got into this
>>> "disconnected" state.
>>>
>>> Wido
>>>
>>
>

Re: Orphaned libvirt storage pools

Posted by Wido den Hollander <wi...@widodh.nl>.

Hi,

On 06/07/2013 04:30 PM, Marcus Sorensen wrote:
> Does this only happen with isos?

Yes, it does.

My work-around for now was to locate all the Instances who had these 
ISOs attached and detach them from all (~100 instances..)

Then I manually unmounted all the mountpoints under /mnt so that they 
can be re-used again.

This cluster was upgraded to 4.1 from 4.0 with libvirt 1.0.2 (coming 
from 0.9.8).

Somehow libvirt forgot about these storage pools.

Wido

> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:
>
>> Hi,
>>
>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there are
>> some related issues:
>> * CLOUDSTACK-2729
>> * CLOUDSTACK-2780
>>
>> I restarted my Agent and the issue described in 2893 went away, but I'm
>> wondering how that happened.
>>
>> Anyway, after going further I found that I have some "orphaned" storage
>> pools, with that I mean, they are mounted and in use, but not defined nor
>> active in libvirt:
>>
>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
>> -n|uniq
>> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
>> f84e51ab-d203-3114-b581-**247b81b7d2c1
>> fd968b03-bd11-3179-a2b3-**73def7c66c68
>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
>> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
>> 8e005344-6a65-3802-ab36-**31befc95abf3
>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
>> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
>> 1287a27d-0383-3f5a-84aa-**61211621d451
>> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
>> root@n02:~#
>>
>> Looking at libvirt:
>> root@n02:~# virsh pool-list
>> Name                 State      Autostart
>> ------------------------------**-----------
>> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
>> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
>> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
>>
>> root@n02:~#
>>
>> What happens here is that the mountpoints are in use (ISO attached to
>> Instance) but there is no storage pool in libvirt.
>>
>> This means that when you try to deploy a second VM with the same ISO
>> libvirt will error out since the Agent will try to create and start a new
>> storage pool which will fail since the mountpoint is already in use.
>>
>> The remedy would be to take the hypervisor into maintainence, reboot int
>> completely and migrate Instances to it again.
>>
>> In libvirt there is no way to start a NFS storage pool without libvirt
>> mounting it.
>>
>> Any suggestions on how we can work around this code wise?
>>
>> For my issue I'm writing a patch which adds some more debug lines to show
>> what the Agent is doing, but it's kind of weird that we got into this
>> "disconnected" state.
>>
>> Wido
>>
>

Re: Orphaned libvirt storage pools

Posted by Marcus Sorensen <sh...@gmail.com>.

Does this only happen with isos?
On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wi...@widodh.nl> wrote:

> Hi,
>
> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there are
> some related issues:
> * CLOUDSTACK-2729
> * CLOUDSTACK-2780
>
> I restarted my Agent and the issue described in 2893 went away, but I'm
> wondering how that happened.
>
> Anyway, after going further I found that I have some "orphaned" storage
> pools, with that I mean, they are mounted and in use, but not defined nor
> active in libvirt:
>
> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
> -n|uniq
> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
> f84e51ab-d203-3114-b581-**247b81b7d2c1
> fd968b03-bd11-3179-a2b3-**73def7c66c68
> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
> 8e005344-6a65-3802-ab36-**31befc95abf3
> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
> 1287a27d-0383-3f5a-84aa-**61211621d451
> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
> root@n02:~#
>
> Looking at libvirt:
> root@n02:~# virsh pool-list
> Name                 State      Autostart
> ------------------------------**-----------
> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
>
> root@n02:~#
>
> What happens here is that the mountpoints are in use (ISO attached to
> Instance) but there is no storage pool in libvirt.
>
> This means that when you try to deploy a second VM with the same ISO
> libvirt will error out since the Agent will try to create and start a new
> storage pool which will fail since the mountpoint is already in use.
>
> The remedy would be to take the hypervisor into maintainence, reboot int
> completely and migrate Instances to it again.
>
> In libvirt there is no way to start a NFS storage pool without libvirt
> mounting it.
>
> Any suggestions on how we can work around this code wise?
>
> For my issue I'm writing a patch which adds some more debug lines to show
> what the Agent is doing, but it's kind of weird that we got into this
> "disconnected" state.
>
> Wido
>