You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Edison Su <Ed...@citrix.com> on 2013/08/02 01:55:10 UTC

RE: How to fix libvirt storage pool refresh issue?

Hi Wei, regarding to the bug CLOUDSTACK-2729, I removed storage.refresh during getStoragePool in LibvirtStorageAdaptor, but the issue still happened in BVT.
I am thinking add file lock on primary storage, seems you already have the patch, could you share the patch with us?

> -----Original Message-----
> From: Wei ZHOU [mailto:ustcweizhou@gmail.com]
> Sent: Tuesday, July 16, 2013 3:35 AM
> To: dev@cloudstack.apache.org
> Subject: Re: How to fix libvirt storage pool refresh issue?
> 
> I agree with Wido.
> 
> Moreover, the file lock will cause performane degrade of VM deployment.
> 
> -Wei
> 
> 
> 2013/7/16 Wido den Hollander <wi...@widodh.nl>
> 
> > On 07/16/2013 12:27 AM, Marcus Sorensen wrote:
> >
> >>     I'm ok with a symptom fix on our end, if the root cause is in
> >> Libvirt we can't do much about that. This is the sort of patch that
> >> tends to get pulled into the regular update cycle of the
> >> distributions, so unless there's more to it and it's not a good fix I
> >> imagine we will see it come through without having to wait for the
> >> next point releases. We still have to support existing users who
> >> might not be running the latest, though, so the symptom fix is
> >> probably ok as a temporary measure.
> >>
> >
> > I'm ok with not calling storagePoolRefresh every time we want a
> > capacity update, since that's also kind of I/O intensive for larger storage
> arrays.
> >
> > However, we should make sure we have a GOOD comment in the code
> about
> > this "fix", since that's the reason I initially removed the old code
> > which invoked "df".
> >
> > I'll see if I can get this libvirt patch into Ubuntu when it hits
> > libvirt upstream, since this bug is really annoying.
> >
> > Wido
> >
> >
> >
> >> On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <Ed...@citrix.com> wrote:
> >>
> >>> There is a serious issue on KVM(https://issues.apache.org/**
> >>> jira/browse/CLOUDSTACK-
> 2729<https://issues.apache.org/jira/browse/CLOUDSTACK-2729>):
> >>> a libvirt storage pool can disappear on KVM host, it's easy to be
> >>> reproduced in our internal QA environment.
> >>> Wei found the root cause, is on the libvirt:
> >>> "
> >>> This is a libvirt issue. I created a ticket for it.
> >>> https://bugzilla.redhat.com/**show_bug.cgi?id=977706<https://bugzill
> >>> a.redhat.com/show_bug.cgi?id=977706>
> >>> The patch is very simple.
> >>> https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.h
> >>> tml<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.h
> >>> tml>
> >>> "
> >>> But it's also introduced by CloudStack, as cloudstack will call
> >>> libvirt storage pool refresh method each time when access the
> >>> storage pool. The code is added by commit:
> >>> 2ffc9907f7b0d371737e39b7649f7a**f23026f5cf,
> >>> about less than one year ago.
> >>>
> >>> As Wei suggested, we can call storage pool refresh only if needed,
> >>> it will mitigate the issue(It's behavior I did on cloudstack
> >>> pre-4.0), but it's only treat the symptom, not the cause.
> >>> Or add a cluster wide lock, only one guy can access storage pool at
> >>> one time, we can add a file lock on NFS primary storage.
> >>> Any idea/feedback on how to fix this KVM issue?
> >>>
> >>>
> >>>
> >>>
> >

Re: How to fix libvirt storage pool refresh issue?

Posted by Wido den Hollander <wi...@widodh.nl>.

On 08/02/2013 01:22 PM, Wei ZHOU wrote:
> Wido,
>
> You applied the libvirt patch on your production system, and this issue
> disappeared, right?

Yes, I applied that patch and rebuild libvirt. The issue never came back.

> If so, that is good.
> I expect the redhat community can accep the patch (or the v2
> https://www.redhat.com/archives/libvir-list/2013-July/msg00639.html) ASAP.
>

I've been watching the thread, but it seems kind of dead.

Wido

>
> -Wei
>
>
> 2013/8/2 Wido den Hollander <wi...@widodh.nl>
>
>>
>>
>> On 08/02/2013 01:55 AM, Edison Su wrote:
>>
>>> Hi Wei, regarding to the bug CLOUDSTACK-2729, I removed storage.refresh
>>> during getStoragePool in LibvirtStorageAdaptor, but the issue still
>>> happened in BVT.
>>> I am thinking add file lock on primary storage, seems you already have
>>> the patch, could you share the patch with us?
>>>
>>>
>> Fyi, I fixed this by patching the libvirt on our production systems rather
>> then fixing the CloudStack agent.
>>
>> It's just one very small patch: https://bugzilla.redhat.com/**
>> show_bug.cgi?id=977706<https://bugzilla.redhat.com/show_bug.cgi?id=977706>
>>
>> https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.html<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html>
>>
>> Wido
>>
>>
>>   -----Original Message-----
>>>> From: Wei ZHOU [mailto:ustcweizhou@gmail.com]
>>>> Sent: Tuesday, July 16, 2013 3:35 AM
>>>> To: dev@cloudstack.apache.org
>>>> Subject: Re: How to fix libvirt storage pool refresh issue?
>>>>
>>>> I agree with Wido.
>>>>
>>>> Moreover, the file lock will cause performane degrade of VM deployment.
>>>>
>>>> -Wei
>>>>
>>>>
>>>> 2013/7/16 Wido den Hollander <wi...@widodh.nl>
>>>>
>>>>   On 07/16/2013 12:27 AM, Marcus Sorensen wrote:
>>>>>
>>>>>        I'm ok with a symptom fix on our end, if the root cause is in
>>>>>> Libvirt we can't do much about that. This is the sort of patch that
>>>>>> tends to get pulled into the regular update cycle of the
>>>>>> distributions, so unless there's more to it and it's not a good fix I
>>>>>> imagine we will see it come through without having to wait for the
>>>>>> next point releases. We still have to support existing users who
>>>>>> might not be running the latest, though, so the symptom fix is
>>>>>> probably ok as a temporary measure.
>>>>>>
>>>>>>
>>>>> I'm ok with not calling storagePoolRefresh every time we want a
>>>>> capacity update, since that's also kind of I/O intensive for larger
>>>>> storage
>>>>>
>>>> arrays.
>>>>
>>>>>
>>>>> However, we should make sure we have a GOOD comment in the code
>>>>>
>>>> about
>>>>
>>>>> this "fix", since that's the reason I initially removed the old code
>>>>> which invoked "df".
>>>>>
>>>>> I'll see if I can get this libvirt patch into Ubuntu when it hits
>>>>> libvirt upstream, since this bug is really annoying.
>>>>>
>>>>> Wido
>>>>>
>>>>>
>>>>>
>>>>>   On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <Ed...@citrix.com>
>>>>>> wrote:
>>>>>>
>>>>>>   There is a serious issue on KVM(https://issues.apache.org/****<https://issues.apache.org/**>
>>>>>>> jira/browse/CLOUDSTACK-
>>>>>>>
>>>>>> 2729<https://issues.apache.**org/jira/browse/CLOUDSTACK-**2729<https://issues.apache.org/jira/browse/CLOUDSTACK-2729>
>>>>> ):
>>>>
>>>>> a libvirt storage pool can disappear on KVM host, it's easy to be
>>>>>>> reproduced in our internal QA environment.
>>>>>>> Wei found the root cause, is on the libvirt:
>>>>>>> "
>>>>>>> This is a libvirt issue. I created a ticket for it.
>>>>>>> https://bugzilla.redhat.com/****show_bug.cgi?id=977706<https://bugzilla.redhat.com/**show_bug.cgi?id=977706>
>>>>>>> <https:/**/bugzill <https://bugzill>
>>>>>>> a.redhat.com/show_bug.cgi?id=**977706<http://a.redhat.com/show_bug.cgi?id=977706>
>>>>>>>>
>>>>>>> The patch is very simple.
>>>>>>> https://www.redhat.com/****archives/libvir-list/2013-****
>>>>>>> July/msg00635.h<https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.h>
>>>>>>> tml<https://www.redhat.com/**archives/libvir-list/2013-**
>>>>>>> July/msg00635.h<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.h>
>>>>>>> tml>
>>>>>>> "
>>>>>>> But it's also introduced by CloudStack, as cloudstack will call
>>>>>>> libvirt storage pool refresh method each time when access the
>>>>>>> storage pool. The code is added by commit:
>>>>>>> 2ffc9907f7b0d371737e39b7649f7a****f23026f5cf,
>>>>>>> about less than one year ago.
>>>>>>>
>>>>>>> As Wei suggested, we can call storage pool refresh only if needed,
>>>>>>> it will mitigate the issue(It's behavior I did on cloudstack
>>>>>>> pre-4.0), but it's only treat the symptom, not the cause.
>>>>>>> Or add a cluster wide lock, only one guy can access storage pool at
>>>>>>> one time, we can add a file lock on NFS primary storage.
>>>>>>> Any idea/feedback on how to fix this KVM issue?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>

Re: How to fix libvirt storage pool refresh issue?

Posted by Wei ZHOU <us...@gmail.com>.
Wido,

You applied the libvirt patch on your production system, and this issue
disappeared, right?
If so, that is good.
I expect the redhat community can accep the patch (or the v2
https://www.redhat.com/archives/libvir-list/2013-July/msg00639.html) ASAP.


-Wei


2013/8/2 Wido den Hollander <wi...@widodh.nl>

>
>
> On 08/02/2013 01:55 AM, Edison Su wrote:
>
>> Hi Wei, regarding to the bug CLOUDSTACK-2729, I removed storage.refresh
>> during getStoragePool in LibvirtStorageAdaptor, but the issue still
>> happened in BVT.
>> I am thinking add file lock on primary storage, seems you already have
>> the patch, could you share the patch with us?
>>
>>
> Fyi, I fixed this by patching the libvirt on our production systems rather
> then fixing the CloudStack agent.
>
> It's just one very small patch: https://bugzilla.redhat.com/**
> show_bug.cgi?id=977706<https://bugzilla.redhat.com/show_bug.cgi?id=977706>
>
> https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.html<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html>
>
> Wido
>
>
>  -----Original Message-----
>>> From: Wei ZHOU [mailto:ustcweizhou@gmail.com]
>>> Sent: Tuesday, July 16, 2013 3:35 AM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: How to fix libvirt storage pool refresh issue?
>>>
>>> I agree with Wido.
>>>
>>> Moreover, the file lock will cause performane degrade of VM deployment.
>>>
>>> -Wei
>>>
>>>
>>> 2013/7/16 Wido den Hollander <wi...@widodh.nl>
>>>
>>>  On 07/16/2013 12:27 AM, Marcus Sorensen wrote:
>>>>
>>>>       I'm ok with a symptom fix on our end, if the root cause is in
>>>>> Libvirt we can't do much about that. This is the sort of patch that
>>>>> tends to get pulled into the regular update cycle of the
>>>>> distributions, so unless there's more to it and it's not a good fix I
>>>>> imagine we will see it come through without having to wait for the
>>>>> next point releases. We still have to support existing users who
>>>>> might not be running the latest, though, so the symptom fix is
>>>>> probably ok as a temporary measure.
>>>>>
>>>>>
>>>> I'm ok with not calling storagePoolRefresh every time we want a
>>>> capacity update, since that's also kind of I/O intensive for larger
>>>> storage
>>>>
>>> arrays.
>>>
>>>>
>>>> However, we should make sure we have a GOOD comment in the code
>>>>
>>> about
>>>
>>>> this "fix", since that's the reason I initially removed the old code
>>>> which invoked "df".
>>>>
>>>> I'll see if I can get this libvirt patch into Ubuntu when it hits
>>>> libvirt upstream, since this bug is really annoying.
>>>>
>>>> Wido
>>>>
>>>>
>>>>
>>>>  On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <Ed...@citrix.com>
>>>>> wrote:
>>>>>
>>>>>  There is a serious issue on KVM(https://issues.apache.org/****<https://issues.apache.org/**>
>>>>>> jira/browse/CLOUDSTACK-
>>>>>>
>>>>> 2729<https://issues.apache.**org/jira/browse/CLOUDSTACK-**2729<https://issues.apache.org/jira/browse/CLOUDSTACK-2729>
>>> >):
>>>
>>>> a libvirt storage pool can disappear on KVM host, it's easy to be
>>>>>> reproduced in our internal QA environment.
>>>>>> Wei found the root cause, is on the libvirt:
>>>>>> "
>>>>>> This is a libvirt issue. I created a ticket for it.
>>>>>> https://bugzilla.redhat.com/****show_bug.cgi?id=977706<https://bugzilla.redhat.com/**show_bug.cgi?id=977706>
>>>>>> <https:/**/bugzill <https://bugzill>
>>>>>> a.redhat.com/show_bug.cgi?id=**977706<http://a.redhat.com/show_bug.cgi?id=977706>
>>>>>> >
>>>>>> The patch is very simple.
>>>>>> https://www.redhat.com/****archives/libvir-list/2013-****
>>>>>> July/msg00635.h<https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.h>
>>>>>> tml<https://www.redhat.com/**archives/libvir-list/2013-**
>>>>>> July/msg00635.h<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.h>
>>>>>> tml>
>>>>>> "
>>>>>> But it's also introduced by CloudStack, as cloudstack will call
>>>>>> libvirt storage pool refresh method each time when access the
>>>>>> storage pool. The code is added by commit:
>>>>>> 2ffc9907f7b0d371737e39b7649f7a****f23026f5cf,
>>>>>> about less than one year ago.
>>>>>>
>>>>>> As Wei suggested, we can call storage pool refresh only if needed,
>>>>>> it will mitigate the issue(It's behavior I did on cloudstack
>>>>>> pre-4.0), but it's only treat the symptom, not the cause.
>>>>>> Or add a cluster wide lock, only one guy can access storage pool at
>>>>>> one time, we can add a file lock on NFS primary storage.
>>>>>> Any idea/feedback on how to fix this KVM issue?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>

Re: How to fix libvirt storage pool refresh issue?

Posted by Wido den Hollander <wi...@widodh.nl>.

On 08/02/2013 01:55 AM, Edison Su wrote:
> Hi Wei, regarding to the bug CLOUDSTACK-2729, I removed storage.refresh during getStoragePool in LibvirtStorageAdaptor, but the issue still happened in BVT.
> I am thinking add file lock on primary storage, seems you already have the patch, could you share the patch with us?
>

Fyi, I fixed this by patching the libvirt on our production systems 
rather then fixing the CloudStack agent.

It's just one very small patch: 
https://bugzilla.redhat.com/show_bug.cgi?id=977706

https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html

Wido

>> -----Original Message-----
>> From: Wei ZHOU [mailto:ustcweizhou@gmail.com]
>> Sent: Tuesday, July 16, 2013 3:35 AM
>> To: dev@cloudstack.apache.org
>> Subject: Re: How to fix libvirt storage pool refresh issue?
>>
>> I agree with Wido.
>>
>> Moreover, the file lock will cause performane degrade of VM deployment.
>>
>> -Wei
>>
>>
>> 2013/7/16 Wido den Hollander <wi...@widodh.nl>
>>
>>> On 07/16/2013 12:27 AM, Marcus Sorensen wrote:
>>>
>>>>      I'm ok with a symptom fix on our end, if the root cause is in
>>>> Libvirt we can't do much about that. This is the sort of patch that
>>>> tends to get pulled into the regular update cycle of the
>>>> distributions, so unless there's more to it and it's not a good fix I
>>>> imagine we will see it come through without having to wait for the
>>>> next point releases. We still have to support existing users who
>>>> might not be running the latest, though, so the symptom fix is
>>>> probably ok as a temporary measure.
>>>>
>>>
>>> I'm ok with not calling storagePoolRefresh every time we want a
>>> capacity update, since that's also kind of I/O intensive for larger storage
>> arrays.
>>>
>>> However, we should make sure we have a GOOD comment in the code
>> about
>>> this "fix", since that's the reason I initially removed the old code
>>> which invoked "df".
>>>
>>> I'll see if I can get this libvirt patch into Ubuntu when it hits
>>> libvirt upstream, since this bug is really annoying.
>>>
>>> Wido
>>>
>>>
>>>
>>>> On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>
>>>>> There is a serious issue on KVM(https://issues.apache.org/**
>>>>> jira/browse/CLOUDSTACK-
>> 2729<https://issues.apache.org/jira/browse/CLOUDSTACK-2729>):
>>>>> a libvirt storage pool can disappear on KVM host, it's easy to be
>>>>> reproduced in our internal QA environment.
>>>>> Wei found the root cause, is on the libvirt:
>>>>> "
>>>>> This is a libvirt issue. I created a ticket for it.
>>>>> https://bugzilla.redhat.com/**show_bug.cgi?id=977706<https://bugzill
>>>>> a.redhat.com/show_bug.cgi?id=977706>
>>>>> The patch is very simple.
>>>>> https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.h
>>>>> tml<https://www.redhat.com/archives/libvir-list/2013-July/msg00635.h
>>>>> tml>
>>>>> "
>>>>> But it's also introduced by CloudStack, as cloudstack will call
>>>>> libvirt storage pool refresh method each time when access the
>>>>> storage pool. The code is added by commit:
>>>>> 2ffc9907f7b0d371737e39b7649f7a**f23026f5cf,
>>>>> about less than one year ago.
>>>>>
>>>>> As Wei suggested, we can call storage pool refresh only if needed,
>>>>> it will mitigate the issue(It's behavior I did on cloudstack
>>>>> pre-4.0), but it's only treat the symptom, not the cause.
>>>>> Or add a cluster wide lock, only one guy can access storage pool at
>>>>> one time, we can add a file lock on NFS primary storage.
>>>>> Any idea/feedback on how to fix this KVM issue?
>>>>>
>>>>>
>>>>>
>>>>>
>>>