You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Andrei Mikhailovsky <an...@arhont.com> on 2014/04/02 14:32:53 UTC

Re: ALARM - ACS reboots host servers!!!

Coming back to this issue.

This time to perform the maintenance of the nfs primary storage I've plated the storage in question in the Maintenance mode. After about 20 minutes ACS showed the nfs storage is in Maintenance. However, none of the virtual machines with volumes on that storage were stopped. I've manually stopped the virtual machines and went to upgrade and restart the nfs server.

A few minutes after the nfs server shutdown all of my host servers went into reboot killing all vms!

Thus, it seems that putting nfs server in Maintenance mode does not stop ACS agent from restarting the host servers.

Does anyone know a way to stop this behaviour? 

Thanks

Andrei


----- Original Message -----
From: "France" <ma...@isg.si>
To: users@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Sent: Monday, 3 March, 2014 9:49:28 AM
Subject: Re: ALARM - ACS reboots host servers!!!

I believe this is a bug too, because VMs not running on the storage, get 
destroyed too:

Issue has been around for a long time, like with all others I reported. 
They do not get fixed:
https://issues.apache.org/jira/browse/CLOUDSTACK-3367

We even lost assignee today.

Regards,
F.

On 3/3/14 6:55 AM, Koushik Das wrote:
> The primary storage needs to be put in maintenance before doing any upgrade/reboot as mentioned in the previous mails.
>
> -Koushik
>
> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
>
>> Also, please note that in the bug you referenced it doesn't have a
>> problem with the reboot being triggered, but with the fact that reboot
>> never completes due to hanging NFS mount (which is why the reboot
>> occurs, inaccessible primary storage).
>>
>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
>>> Or do you mean you have multiple primary storages and this one was not
>>> in use and put into maintenance?
>>>
>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com> wrote:
>>>> I'm not sure I understand. How do you expect to reboot your primary
>>>> storage while vms are running?  It sounds like the host is being
>>>> fenced since it cannot contact the resources it depends on.
>>>>
>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
>>>>>> Hello guys,
>>>>>>
>>>>>>
>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has rebooted
>>>>>> all of my host servers without properly shutting down the guest vms.
>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
>>>>>> servers and a few minutes later, to my horror, i've found out that all
>>>>>> of my host servers have been rebooted. Is it just me thinking so, or
>>>>>> is this bug should be fixed ASAP and should be a blocker for any new
>>>>>> ACS release. I mean not only does it cause downtime, but also possible
>>>>>> data loss and server corruption.
>>>>>
>>>>> Hi Andrei,
>>>>>
>>>>> Do you have HA enabled and did you put that primary storage in maintenance
>>>>> mode before rebooting it?
>>>>> It's my understanding that ACS relies on the shared storage to perform HA so
>>>>> if the storage goes it's expected to go berserk. I've noticed similar
>>>>> behaviour in Xenserver pools without ACS.
>>>>> I'd imagine a "cure" for this would be to use network distributed
>>>>> "filesystems" like GlusterFS or CEPH.
>>>>>
>>>>> Lucian
>>>>>
>>>>> --
>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>
>>>>> Nux!
>>>>> www.nux.ro

Re: ALARM - ACS reboots host servers!!!

Posted by France <ma...@isg.si>.

I think this problem might only exist on KVM.
Can any1 with primary NFS test it on XenServer?


On 3/4/14 9:48 PM, Andrei Mikhailovsky wrote:
> +1
>
>
> ----- Original Message -----
> From: "Alex Huang" <Al...@citrix.com>
> To: dev@cloudstack.apache.org
> Sent: Thursday, 3 April, 2014 6:47:22 PM
> Subject: RE: ALARM - ACS reboots host servers!!!
>
> This is a severe bug if that's the case.  It's supposed to stop the heartbeat script when a primary storage is placed in maintenance.
>
> --Alex
>
>> -----Original Message-----
>> From: France [mailto:mailinglists@isg.si]
>> Sent: Thursday, April 3, 2014 1:06 AM
>> To: dev@cloudstack.apache.org
>> Subject: Re: ALARM - ACS reboots host servers!!!
>>
>> I'm also interested in this issue.
>> Can any1 from developers confirm this is expected behavior?
>>
>> On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
>>> Coming back to this issue.
>>>
>>> This time to perform the maintenance of the nfs primary storage I've
>> plated the storage in question in the Maintenance mode. After about 20
>> minutes ACS showed the nfs storage is in Maintenance. However, none of
>> the virtual machines with volumes on that storage were stopped. I've
>> manually stopped the virtual machines and went to upgrade and restart the
>> nfs server.
>>> A few minutes after the nfs server shutdown all of my host servers went
>> into reboot killing all vms!
>>> Thus, it seems that putting nfs server in Maintenance mode does not stop
>> ACS agent from restarting the host servers.
>>> Does anyone know a way to stop this behaviour?
>>>
>>> Thanks
>>>
>>> Andrei
>>>
>>>
>>> ----- Original Message -----
>>> From: "France" <ma...@isg.si>
>>> To: users@cloudstack.apache.org
>>> Cc: dev@cloudstack.apache.org
>>> Sent: Monday, 3 March, 2014 9:49:28 AM
>>> Subject: Re: ALARM - ACS reboots host servers!!!
>>>
>>> I believe this is a bug too, because VMs not running on the storage,
>>> get destroyed too:
>>>
>>> Issue has been around for a long time, like with all others I reported.
>>> They do not get fixed:
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3367
>>>
>>> We even lost assignee today.
>>>
>>> Regards,
>>> F.
>>>
>>> On 3/3/14 6:55 AM, Koushik Das wrote:
>>>> The primary storage needs to be put in maintenance before doing any
>> upgrade/reboot as mentioned in the previous mails.
>>>> -Koushik
>>>>
>>>> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
>>>>
>>>>> Also, please note that in the bug you referenced it doesn't have a
>>>>> problem with the reboot being triggered, but with the fact that
>>>>> reboot never completes due to hanging NFS mount (which is why the
>>>>> reboot occurs, inaccessible primary storage).
>>>>>
>>>>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
>>>>>> Or do you mean you have multiple primary storages and this one was
>>>>>> not in use and put into maintenance?
>>>>>>
>>>>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com>
>> wrote:
>>>>>>> I'm not sure I understand. How do you expect to reboot your
>>>>>>> primary storage while vms are running?  It sounds like the host is
>>>>>>> being fenced since it cannot contact the resources it depends on.
>>>>>>>
>>>>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
>>>>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
>>>>>>>>> Hello guys,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has
>>>>>>>>> rebooted all of my host servers without properly shutting down the
>> guest vms.
>>>>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
>>>>>>>>> servers and a few minutes later, to my horror, i've found out
>>>>>>>>> that all of my host servers have been rebooted. Is it just me
>>>>>>>>> thinking so, or is this bug should be fixed ASAP and should be a
>>>>>>>>> blocker for any new ACS release. I mean not only does it cause
>>>>>>>>> downtime, but also possible data loss and server corruption.
>>>>>>>> Hi Andrei,
>>>>>>>>
>>>>>>>> Do you have HA enabled and did you put that primary storage in
>>>>>>>> maintenance mode before rebooting it?
>>>>>>>> It's my understanding that ACS relies on the shared storage to
>>>>>>>> perform HA so if the storage goes it's expected to go berserk.
>>>>>>>> I've noticed similar behaviour in Xenserver pools without ACS.
>>>>>>>> I'd imagine a "cure" for this would be to use network distributed
>>>>>>>> "filesystems" like GlusterFS or CEPH.
>>>>>>>>
>>>>>>>> Lucian
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>>>>
>>>>>>>> Nux!
>>>>>>>> www.nux.ro

Re: ALARM - ACS reboots host servers!!!

Posted by Andrei Mikhailovsky <an...@arhont.com>.

+1


----- Original Message -----
From: "Alex Huang" <Al...@citrix.com>
To: dev@cloudstack.apache.org
Sent: Thursday, 3 April, 2014 6:47:22 PM
Subject: RE: ALARM - ACS reboots host servers!!!

This is a severe bug if that's the case.  It's supposed to stop the heartbeat script when a primary storage is placed in maintenance.

--Alex

> -----Original Message-----
> From: France [mailto:mailinglists@isg.si]
> Sent: Thursday, April 3, 2014 1:06 AM
> To: dev@cloudstack.apache.org
> Subject: Re: ALARM - ACS reboots host servers!!!
> 
> I'm also interested in this issue.
> Can any1 from developers confirm this is expected behavior?
> 
> On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
> > Coming back to this issue.
> >
> > This time to perform the maintenance of the nfs primary storage I've
> plated the storage in question in the Maintenance mode. After about 20
> minutes ACS showed the nfs storage is in Maintenance. However, none of
> the virtual machines with volumes on that storage were stopped. I've
> manually stopped the virtual machines and went to upgrade and restart the
> nfs server.
> >
> > A few minutes after the nfs server shutdown all of my host servers went
> into reboot killing all vms!
> >
> > Thus, it seems that putting nfs server in Maintenance mode does not stop
> ACS agent from restarting the host servers.
> >
> > Does anyone know a way to stop this behaviour?
> >
> > Thanks
> >
> > Andrei
> >
> >
> > ----- Original Message -----
> > From: "France" <ma...@isg.si>
> > To: users@cloudstack.apache.org
> > Cc: dev@cloudstack.apache.org
> > Sent: Monday, 3 March, 2014 9:49:28 AM
> > Subject: Re: ALARM - ACS reboots host servers!!!
> >
> > I believe this is a bug too, because VMs not running on the storage,
> > get destroyed too:
> >
> > Issue has been around for a long time, like with all others I reported.
> > They do not get fixed:
> > https://issues.apache.org/jira/browse/CLOUDSTACK-3367
> >
> > We even lost assignee today.
> >
> > Regards,
> > F.
> >
> > On 3/3/14 6:55 AM, Koushik Das wrote:
> >> The primary storage needs to be put in maintenance before doing any
> upgrade/reboot as mentioned in the previous mails.
> >>
> >> -Koushik
> >>
> >> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
> >>
> >>> Also, please note that in the bug you referenced it doesn't have a
> >>> problem with the reboot being triggered, but with the fact that
> >>> reboot never completes due to hanging NFS mount (which is why the
> >>> reboot occurs, inaccessible primary storage).
> >>>
> >>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
> >>>> Or do you mean you have multiple primary storages and this one was
> >>>> not in use and put into maintenance?
> >>>>
> >>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com>
> wrote:
> >>>>> I'm not sure I understand. How do you expect to reboot your
> >>>>> primary storage while vms are running?  It sounds like the host is
> >>>>> being fenced since it cannot contact the resources it depends on.
> >>>>>
> >>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
> >>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
> >>>>>>> Hello guys,
> >>>>>>>
> >>>>>>>
> >>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has
> >>>>>>> rebooted all of my host servers without properly shutting down the
> guest vms.
> >>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
> >>>>>>> servers and a few minutes later, to my horror, i've found out
> >>>>>>> that all of my host servers have been rebooted. Is it just me
> >>>>>>> thinking so, or is this bug should be fixed ASAP and should be a
> >>>>>>> blocker for any new ACS release. I mean not only does it cause
> >>>>>>> downtime, but also possible data loss and server corruption.
> >>>>>> Hi Andrei,
> >>>>>>
> >>>>>> Do you have HA enabled and did you put that primary storage in
> >>>>>> maintenance mode before rebooting it?
> >>>>>> It's my understanding that ACS relies on the shared storage to
> >>>>>> perform HA so if the storage goes it's expected to go berserk.
> >>>>>> I've noticed similar behaviour in Xenserver pools without ACS.
> >>>>>> I'd imagine a "cure" for this would be to use network distributed
> >>>>>> "filesystems" like GlusterFS or CEPH.
> >>>>>>
> >>>>>> Lucian
> >>>>>>
> >>>>>> --
> >>>>>> Sent from the Delta quadrant using Borg technology!
> >>>>>>
> >>>>>> Nux!
> >>>>>> www.nux.ro

RE: ALARM - ACS reboots host servers!!!

Posted by Alex Huang <Al...@citrix.com>.

This is a severe bug if that's the case.  It's supposed to stop the heartbeat script when a primary storage is placed in maintenance.

--Alex

> -----Original Message-----
> From: France [mailto:mailinglists@isg.si]
> Sent: Thursday, April 3, 2014 1:06 AM
> To: dev@cloudstack.apache.org
> Subject: Re: ALARM - ACS reboots host servers!!!
> 
> I'm also interested in this issue.
> Can any1 from developers confirm this is expected behavior?
> 
> On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
> > Coming back to this issue.
> >
> > This time to perform the maintenance of the nfs primary storage I've
> plated the storage in question in the Maintenance mode. After about 20
> minutes ACS showed the nfs storage is in Maintenance. However, none of
> the virtual machines with volumes on that storage were stopped. I've
> manually stopped the virtual machines and went to upgrade and restart the
> nfs server.
> >
> > A few minutes after the nfs server shutdown all of my host servers went
> into reboot killing all vms!
> >
> > Thus, it seems that putting nfs server in Maintenance mode does not stop
> ACS agent from restarting the host servers.
> >
> > Does anyone know a way to stop this behaviour?
> >
> > Thanks
> >
> > Andrei
> >
> >
> > ----- Original Message -----
> > From: "France" <ma...@isg.si>
> > To: users@cloudstack.apache.org
> > Cc: dev@cloudstack.apache.org
> > Sent: Monday, 3 March, 2014 9:49:28 AM
> > Subject: Re: ALARM - ACS reboots host servers!!!
> >
> > I believe this is a bug too, because VMs not running on the storage,
> > get destroyed too:
> >
> > Issue has been around for a long time, like with all others I reported.
> > They do not get fixed:
> > https://issues.apache.org/jira/browse/CLOUDSTACK-3367
> >
> > We even lost assignee today.
> >
> > Regards,
> > F.
> >
> > On 3/3/14 6:55 AM, Koushik Das wrote:
> >> The primary storage needs to be put in maintenance before doing any
> upgrade/reboot as mentioned in the previous mails.
> >>
> >> -Koushik
> >>
> >> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
> >>
> >>> Also, please note that in the bug you referenced it doesn't have a
> >>> problem with the reboot being triggered, but with the fact that
> >>> reboot never completes due to hanging NFS mount (which is why the
> >>> reboot occurs, inaccessible primary storage).
> >>>
> >>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
> >>>> Or do you mean you have multiple primary storages and this one was
> >>>> not in use and put into maintenance?
> >>>>
> >>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com>
> wrote:
> >>>>> I'm not sure I understand. How do you expect to reboot your
> >>>>> primary storage while vms are running?  It sounds like the host is
> >>>>> being fenced since it cannot contact the resources it depends on.
> >>>>>
> >>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
> >>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
> >>>>>>> Hello guys,
> >>>>>>>
> >>>>>>>
> >>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has
> >>>>>>> rebooted all of my host servers without properly shutting down the
> guest vms.
> >>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
> >>>>>>> servers and a few minutes later, to my horror, i've found out
> >>>>>>> that all of my host servers have been rebooted. Is it just me
> >>>>>>> thinking so, or is this bug should be fixed ASAP and should be a
> >>>>>>> blocker for any new ACS release. I mean not only does it cause
> >>>>>>> downtime, but also possible data loss and server corruption.
> >>>>>> Hi Andrei,
> >>>>>>
> >>>>>> Do you have HA enabled and did you put that primary storage in
> >>>>>> maintenance mode before rebooting it?
> >>>>>> It's my understanding that ACS relies on the shared storage to
> >>>>>> perform HA so if the storage goes it's expected to go berserk.
> >>>>>> I've noticed similar behaviour in Xenserver pools without ACS.
> >>>>>> I'd imagine a "cure" for this would be to use network distributed
> >>>>>> "filesystems" like GlusterFS or CEPH.
> >>>>>>
> >>>>>> Lucian
> >>>>>>
> >>>>>> --
> >>>>>> Sent from the Delta quadrant using Borg technology!
> >>>>>>
> >>>>>> Nux!
> >>>>>> www.nux.ro

Re: ALARM - ACS reboots host servers!!!

Posted by Andrei Mikhailovsky <an...@arhont.com>.

I am on KVM.  thanks

----- Original Message -----
From: "France" <ma...@isg.si>
To: dev@cloudstack.apache.org
Sent: Thursday, 3 April, 2014 2:34:53 PM
Subject: Re: ALARM - ACS reboots host servers!!!

Andrei,

is your hypervisor KVM?
I'm using XenServer.

Re: ALARM - ACS reboots host servers!!!

Posted by France <ma...@isg.si>.

Andrei,

is your hypervisor KVM?
I'm using XenServer.

Re: ALARM - ACS reboots host servers!!!

Posted by Wido den Hollander <wi...@widodh.nl>.


On 04/03/2014 10:06 AM, France wrote:
> I'm also interested in this issue.
> Can any1 from developers confirm this is expected behavior?
>

Yes, this still happens due to the kvmheartbeat.sh script which runs.

On some clusters I disabled this by simply overwriting that script with 
a version where "reboot" is removed.

I have some ideas on how to fix this, but I don't have the time at the 
moment.

Short version: The hosts shouldn't reboot themselves as long as they can 
reach other nodes or it should at least be configurable.

The management server should also do further inspection during HA by 
using a helper on the KVM Agent.

Wido

> On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
>> Coming back to this issue.
>>
>> This time to perform the maintenance of the nfs primary storage I've
>> plated the storage in question in the Maintenance mode. After about 20
>> minutes ACS showed the nfs storage is in Maintenance. However, none of
>> the virtual machines with volumes on that storage were stopped. I've
>> manually stopped the virtual machines and went to upgrade and restart
>> the nfs server.
>>
>> A few minutes after the nfs server shutdown all of my host servers
>> went into reboot killing all vms!
>>
>> Thus, it seems that putting nfs server in Maintenance mode does not
>> stop ACS agent from restarting the host servers.
>>
>> Does anyone know a way to stop this behaviour?
>>
>> Thanks
>>
>> Andrei
>>
>>
>> ----- Original Message -----
>> From: "France" <ma...@isg.si>
>> To: users@cloudstack.apache.org
>> Cc: dev@cloudstack.apache.org
>> Sent: Monday, 3 March, 2014 9:49:28 AM
>> Subject: Re: ALARM - ACS reboots host servers!!!
>>
>> I believe this is a bug too, because VMs not running on the storage, get
>> destroyed too:
>>
>> Issue has been around for a long time, like with all others I reported.
>> They do not get fixed:
>> https://issues.apache.org/jira/browse/CLOUDSTACK-3367
>>
>> We even lost assignee today.
>>
>> Regards,
>> F.
>>
>> On 3/3/14 6:55 AM, Koushik Das wrote:
>>> The primary storage needs to be put in maintenance before doing any
>>> upgrade/reboot as mentioned in the previous mails.
>>>
>>> -Koushik
>>>
>>> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
>>>
>>>> Also, please note that in the bug you referenced it doesn't have a
>>>> problem with the reboot being triggered, but with the fact that reboot
>>>> never completes due to hanging NFS mount (which is why the reboot
>>>> occurs, inaccessible primary storage).
>>>>
>>>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
>>>>> Or do you mean you have multiple primary storages and this one was not
>>>>> in use and put into maintenance?
>>>>>
>>>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com> wrote:
>>>>>> I'm not sure I understand. How do you expect to reboot your primary
>>>>>> storage while vms are running?  It sounds like the host is being
>>>>>> fenced since it cannot contact the resources it depends on.
>>>>>>
>>>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
>>>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
>>>>>>>> Hello guys,
>>>>>>>>
>>>>>>>>
>>>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has
>>>>>>>> rebooted
>>>>>>>> all of my host servers without properly shutting down the guest
>>>>>>>> vms.
>>>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
>>>>>>>> servers and a few minutes later, to my horror, i've found out
>>>>>>>> that all
>>>>>>>> of my host servers have been rebooted. Is it just me thinking
>>>>>>>> so, or
>>>>>>>> is this bug should be fixed ASAP and should be a blocker for any
>>>>>>>> new
>>>>>>>> ACS release. I mean not only does it cause downtime, but also
>>>>>>>> possible
>>>>>>>> data loss and server corruption.
>>>>>>> Hi Andrei,
>>>>>>>
>>>>>>> Do you have HA enabled and did you put that primary storage in
>>>>>>> maintenance
>>>>>>> mode before rebooting it?
>>>>>>> It's my understanding that ACS relies on the shared storage to
>>>>>>> perform HA so
>>>>>>> if the storage goes it's expected to go berserk. I've noticed
>>>>>>> similar
>>>>>>> behaviour in Xenserver pools without ACS.
>>>>>>> I'd imagine a "cure" for this would be to use network distributed
>>>>>>> "filesystems" like GlusterFS or CEPH.
>>>>>>>
>>>>>>> Lucian
>>>>>>>
>>>>>>> --
>>>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>>>
>>>>>>> Nux!
>>>>>>> www.nux.ro
>

Re: ALARM - ACS reboots host servers!!!

Posted by France <ma...@isg.si>.

I'm also interested in this issue.
Can any1 from developers confirm this is expected behavior?

On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
> Coming back to this issue.
>
> This time to perform the maintenance of the nfs primary storage I've plated the storage in question in the Maintenance mode. After about 20 minutes ACS showed the nfs storage is in Maintenance. However, none of the virtual machines with volumes on that storage were stopped. I've manually stopped the virtual machines and went to upgrade and restart the nfs server.
>
> A few minutes after the nfs server shutdown all of my host servers went into reboot killing all vms!
>
> Thus, it seems that putting nfs server in Maintenance mode does not stop ACS agent from restarting the host servers.
>
> Does anyone know a way to stop this behaviour?
>
> Thanks
>
> Andrei
>
>
> ----- Original Message -----
> From: "France" <ma...@isg.si>
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Sent: Monday, 3 March, 2014 9:49:28 AM
> Subject: Re: ALARM - ACS reboots host servers!!!
>
> I believe this is a bug too, because VMs not running on the storage, get
> destroyed too:
>
> Issue has been around for a long time, like with all others I reported.
> They do not get fixed:
> https://issues.apache.org/jira/browse/CLOUDSTACK-3367
>
> We even lost assignee today.
>
> Regards,
> F.
>
> On 3/3/14 6:55 AM, Koushik Das wrote:
>> The primary storage needs to be put in maintenance before doing any upgrade/reboot as mentioned in the previous mails.
>>
>> -Koushik
>>
>> On 03-Mar-2014, at 6:07 AM, Marcus <sh...@gmail.com> wrote:
>>
>>> Also, please note that in the bug you referenced it doesn't have a
>>> problem with the reboot being triggered, but with the fact that reboot
>>> never completes due to hanging NFS mount (which is why the reboot
>>> occurs, inaccessible primary storage).
>>>
>>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <sh...@gmail.com> wrote:
>>>> Or do you mean you have multiple primary storages and this one was not
>>>> in use and put into maintenance?
>>>>
>>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <sh...@gmail.com> wrote:
>>>>> I'm not sure I understand. How do you expect to reboot your primary
>>>>> storage while vms are running?  It sounds like the host is being
>>>>> fenced since it cannot contact the resources it depends on.
>>>>>
>>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nu...@li.nux.ro> wrote:
>>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
>>>>>>> Hello guys,
>>>>>>>
>>>>>>>
>>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has rebooted
>>>>>>> all of my host servers without properly shutting down the guest vms.
>>>>>>> I've simply upgraded and rebooted one of the nfs primary storage
>>>>>>> servers and a few minutes later, to my horror, i've found out that all
>>>>>>> of my host servers have been rebooted. Is it just me thinking so, or
>>>>>>> is this bug should be fixed ASAP and should be a blocker for any new
>>>>>>> ACS release. I mean not only does it cause downtime, but also possible
>>>>>>> data loss and server corruption.
>>>>>> Hi Andrei,
>>>>>>
>>>>>> Do you have HA enabled and did you put that primary storage in maintenance
>>>>>> mode before rebooting it?
>>>>>> It's my understanding that ACS relies on the shared storage to perform HA so
>>>>>> if the storage goes it's expected to go berserk. I've noticed similar
>>>>>> behaviour in Xenserver pools without ACS.
>>>>>> I'd imagine a "cure" for this would be to use network distributed
>>>>>> "filesystems" like GlusterFS or CEPH.
>>>>>>
>>>>>> Lucian
>>>>>>
>>>>>> --
>>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>>
>>>>>> Nux!
>>>>>> www.nux.ro