You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Sean Lair <sl...@ippathways.com> on 2018/02/13 23:06:19 UTC

System VMs not migrating when host down

Hi all,

We are testing VM HA and are having a problem with our system VMs (secondary storage and console) not being started up on another host when a host fails.

Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an "Alert" agent state, but never migrate.  We are currently running 4.9.3.


Thanks
Sean

Re: System VMs not migrating when host down

Posted by Simon Weller <sw...@ena.com.INVALID>.
Hey Andrija,


So it sounds like your primary storage isn't enforcing an exclusive lock.  How is your storage exposed to ACS?


We've found that HA doesn't work at all with a host failure on KVM, as those VMs will never be restarted until the host is either recovered, or the host is removed from ACS. We are running a heavily patched 4.8.

- Si
________________________________
From: Andrija Panic <an...@gmail.com>
Sent: Wednesday, February 14, 2018 3:22 AM
To: dev
Subject: Re: System VMs not migrating when host down

Humble opinion (until HOST HA is ready in 4.11 if not mistaken?), avoid
using HA option for VMs  - avoid setting the  "Offer HA" option on any
compute/service offerings, since we did end  up (was it ACS 4.5 or 4.8,
can't remember now) having 2 copies of SAME VM running on 2 different
hosts...imagine storage/volume corruption...this happened a few times for
us.

HOST HA looks like really a nice thing, I have not tested that yet...but
sould completely solve the problem.

On 14 February 2018 at 10:14, Paul Angus <pa...@shapeblue.com> wrote:

> Hi Sean,
>
> The 'problem' with VM HA in KVM is that it relies on the parent host agent
> to be connected to report that the VM is down.  We cannot assume that just
> because a host agent is disconnected, that the VMs on that host are not
> running.
>
> This is where HOST HA comes in, this feature detects loss of connection to
> the agent and then tries to determine if the VMs on that host are active
> and then attempts some corrective action.
>
>
> Kind regards,
>
> Paul Angus
>
> paul.angus@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ...



> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -----Original Message-----
> From: Sean Lair [mailto:slair@ippathways.com]
> Sent: 13 February 2018 23:06
> To: dev@cloudstack.apache.org
> Subject: System VMs not migrating when host down
>
> Hi all,
>
> We are testing VM HA and are having a problem with our system VMs
> (secondary storage and console) not being started up on another host when a
> host fails.
>
> Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an
> "Alert" agent state, but never migrate.  We are currently running 4.9.3.
>
>
> Thanks
> Sean
>



--

Andrija Panić

Re: System VMs not migrating when host down

Posted by Andrija Panic <an...@gmail.com>.
Sean,

This patch (locking thing) looks interesting.
In case you decide to go with it, please make sure it will not be problem
in case of 4.11 release, or HA mechanism in general - i.e. host is UP, gets
disconnected/down/crash/wathever, so it gets KILLED (ipmi/whatever), but
somehow image file is still locked, when the other host try to start VM
with same image. This could break the HA mechanism (just theory). If it
doesn't break it, than it would be a very nice addition to ACS on KVM.


Cheers

On 17 February 2018 at 20:27, Andrija Panic <an...@gmail.com> wrote:

> Hi Simon,
>
> same here 4.8 heavily patched. We were on NFS and/or CEPH, back in the
> days of this issues (KVM also)
>
> @ Sean, this is interesting finding really - at least to avoid 2 VM
> running on top of same image, but otherwise doesn't solve the  HA mechanism
> (4.11 is supposed...)
>
> Thx for the info guys
>
>
>
> On 15 February 2018 at 23:39, Sean Lair <sl...@ippathways.com> wrote:
>
>> Thanks for the replies everyone.
>>
>> After further investigating, I am seeing how broken VM HA is right now
>> (at least in 4.9.3).
>>
>> We've started patching the code so it works again, but once we fixed it -
>> we hit the dreaded VMs running on 2 different hosts... not good!
>>
>> We are KVM w/ NFS.  It looks like the standard CloudStack documentation
>> doesn't specify to use the built-in locking mechanism in libvirtd.  Looks
>> like an easy solution, as if we are locking the VM's disk files, it
>> shouldn't be able to come up on another host...
>>
>> I've seen some of the talk about IPMI being used for Host HA in 4.11...
>> but we don't have IPMI setup yet.  The locking mechanisms in libvirtd seem
>> like the best idea to us so far - but we are just starting to look into it
>> and implement it.
>>
>> https://libvirt.org/locking-lockd.html
>>
>> It reminds us of how VMware vSphere does locking, which works great.
>>
>>
>>
>> -----Original Message-----
>> From: Andrija Panic [mailto:andrija.panic@gmail.com]
>> Sent: Wednesday, February 14, 2018 3:22 AM
>> To: dev <de...@cloudstack.apache.org>
>> Subject: Re: System VMs not migrating when host down
>>
>> Humble opinion (until HOST HA is ready in 4.11 if not mistaken?), avoid
>> using HA option for VMs  - avoid setting the  "Offer HA" option on any
>> compute/service offerings, since we did end  up (was it ACS 4.5 or 4.8,
>> can't remember now) having 2 copies of SAME VM running on 2 different
>> hosts...imagine storage/volume corruption...this happened a few times for
>> us.
>>
>> HOST HA looks like really a nice thing, I have not tested that yet...but
>> sould completely solve the problem.
>>
>> On 14 February 2018 at 10:14, Paul Angus <pa...@shapeblue.com>
>> wrote:
>>
>> > Hi Sean,
>> >
>> > The 'problem' with VM HA in KVM is that it relies on the parent host
>> > agent to be connected to report that the VM is down.  We cannot assume
>> > that just because a host agent is disconnected, that the VMs on that
>> > host are not running.
>> >
>> > This is where HOST HA comes in, this feature detects loss of
>> > connection to the agent and then tries to determine if the VMs on that
>> > host are active and then attempts some corrective action.
>> >
>> >
>> > Kind regards,
>> >
>> > Paul Angus
>> >
>> > paul.angus@shapeblue.com
>> > www.shapeblue.com
>> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Sean Lair [mailto:slair@ippathways.com]
>> > Sent: 13 February 2018 23:06
>> > To: dev@cloudstack.apache.org
>> > Subject: System VMs not migrating when host down
>> >
>> > Hi all,
>> >
>> > We are testing VM HA and are having a problem with our system VMs
>> > (secondary storage and console) not being started up on another host
>> when a
>> > host fails.
>> >
>> > Shouldn't the system VMs be VM HA-enabled?  Currently they are just in
>> an
>> > "Alert" agent state, but never migrate.  We are currently running 4.9.3.
>> >
>> >
>> > Thanks
>> > Sean
>> >
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

Re: System VMs not migrating when host down

Posted by Andrija Panic <an...@gmail.com>.
Hi Simon,

same here 4.8 heavily patched. We were on NFS and/or CEPH, back in the days
of this issues (KVM also)

@ Sean, this is interesting finding really - at least to avoid 2 VM running
on top of same image, but otherwise doesn't solve the  HA mechanism (4.11
is supposed...)

Thx for the info guys



On 15 February 2018 at 23:39, Sean Lair <sl...@ippathways.com> wrote:

> Thanks for the replies everyone.
>
> After further investigating, I am seeing how broken VM HA is right now (at
> least in 4.9.3).
>
> We've started patching the code so it works again, but once we fixed it -
> we hit the dreaded VMs running on 2 different hosts... not good!
>
> We are KVM w/ NFS.  It looks like the standard CloudStack documentation
> doesn't specify to use the built-in locking mechanism in libvirtd.  Looks
> like an easy solution, as if we are locking the VM's disk files, it
> shouldn't be able to come up on another host...
>
> I've seen some of the talk about IPMI being used for Host HA in 4.11...
> but we don't have IPMI setup yet.  The locking mechanisms in libvirtd seem
> like the best idea to us so far - but we are just starting to look into it
> and implement it.
>
> https://libvirt.org/locking-lockd.html
>
> It reminds us of how VMware vSphere does locking, which works great.
>
>
>
> -----Original Message-----
> From: Andrija Panic [mailto:andrija.panic@gmail.com]
> Sent: Wednesday, February 14, 2018 3:22 AM
> To: dev <de...@cloudstack.apache.org>
> Subject: Re: System VMs not migrating when host down
>
> Humble opinion (until HOST HA is ready in 4.11 if not mistaken?), avoid
> using HA option for VMs  - avoid setting the  "Offer HA" option on any
> compute/service offerings, since we did end  up (was it ACS 4.5 or 4.8,
> can't remember now) having 2 copies of SAME VM running on 2 different
> hosts...imagine storage/volume corruption...this happened a few times for
> us.
>
> HOST HA looks like really a nice thing, I have not tested that yet...but
> sould completely solve the problem.
>
> On 14 February 2018 at 10:14, Paul Angus <pa...@shapeblue.com> wrote:
>
> > Hi Sean,
> >
> > The 'problem' with VM HA in KVM is that it relies on the parent host
> > agent to be connected to report that the VM is down.  We cannot assume
> > that just because a host agent is disconnected, that the VMs on that
> > host are not running.
> >
> > This is where HOST HA comes in, this feature detects loss of
> > connection to the agent and then tries to determine if the VMs on that
> > host are active and then attempts some corrective action.
> >
> >
> > Kind regards,
> >
> > Paul Angus
> >
> > paul.angus@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
> >
> >
> >
> >
> > -----Original Message-----
> > From: Sean Lair [mailto:slair@ippathways.com]
> > Sent: 13 February 2018 23:06
> > To: dev@cloudstack.apache.org
> > Subject: System VMs not migrating when host down
> >
> > Hi all,
> >
> > We are testing VM HA and are having a problem with our system VMs
> > (secondary storage and console) not being started up on another host
> when a
> > host fails.
> >
> > Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an
> > "Alert" agent state, but never migrate.  We are currently running 4.9.3.
> >
> >
> > Thanks
> > Sean
> >
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

RE: System VMs not migrating when host down

Posted by Sean Lair <sl...@ippathways.com>.
Thanks for the replies everyone. 

After further investigating, I am seeing how broken VM HA is right now (at least in 4.9.3).

We've started patching the code so it works again, but once we fixed it - we hit the dreaded VMs running on 2 different hosts... not good!

We are KVM w/ NFS.  It looks like the standard CloudStack documentation doesn't specify to use the built-in locking mechanism in libvirtd.  Looks like an easy solution, as if we are locking the VM's disk files, it shouldn't be able to come up on another host...

I've seen some of the talk about IPMI being used for Host HA in 4.11... but we don't have IPMI setup yet.  The locking mechanisms in libvirtd seem like the best idea to us so far - but we are just starting to look into it and implement it.

https://libvirt.org/locking-lockd.html

It reminds us of how VMware vSphere does locking, which works great.

 

-----Original Message-----
From: Andrija Panic [mailto:andrija.panic@gmail.com] 
Sent: Wednesday, February 14, 2018 3:22 AM
To: dev <de...@cloudstack.apache.org>
Subject: Re: System VMs not migrating when host down

Humble opinion (until HOST HA is ready in 4.11 if not mistaken?), avoid using HA option for VMs  - avoid setting the  "Offer HA" option on any compute/service offerings, since we did end  up (was it ACS 4.5 or 4.8, can't remember now) having 2 copies of SAME VM running on 2 different hosts...imagine storage/volume corruption...this happened a few times for us.

HOST HA looks like really a nice thing, I have not tested that yet...but sould completely solve the problem.

On 14 February 2018 at 10:14, Paul Angus <pa...@shapeblue.com> wrote:

> Hi Sean,
>
> The 'problem' with VM HA in KVM is that it relies on the parent host 
> agent to be connected to report that the VM is down.  We cannot assume 
> that just because a host agent is disconnected, that the VMs on that 
> host are not running.
>
> This is where HOST HA comes in, this feature detects loss of 
> connection to the agent and then tries to determine if the VMs on that 
> host are active and then attempts some corrective action.
>
>
> Kind regards,
>
> Paul Angus
>
> paul.angus@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>
>
>
>
> -----Original Message-----
> From: Sean Lair [mailto:slair@ippathways.com]
> Sent: 13 February 2018 23:06
> To: dev@cloudstack.apache.org
> Subject: System VMs not migrating when host down
>
> Hi all,
>
> We are testing VM HA and are having a problem with our system VMs
> (secondary storage and console) not being started up on another host when a
> host fails.
>
> Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an
> "Alert" agent state, but never migrate.  We are currently running 4.9.3.
>
>
> Thanks
> Sean
>



-- 

Andrija Panić

Re: System VMs not migrating when host down

Posted by Andrija Panic <an...@gmail.com>.
Humble opinion (until HOST HA is ready in 4.11 if not mistaken?), avoid
using HA option for VMs  - avoid setting the  "Offer HA" option on any
compute/service offerings, since we did end  up (was it ACS 4.5 or 4.8,
can't remember now) having 2 copies of SAME VM running on 2 different
hosts...imagine storage/volume corruption...this happened a few times for
us.

HOST HA looks like really a nice thing, I have not tested that yet...but
sould completely solve the problem.

On 14 February 2018 at 10:14, Paul Angus <pa...@shapeblue.com> wrote:

> Hi Sean,
>
> The 'problem' with VM HA in KVM is that it relies on the parent host agent
> to be connected to report that the VM is down.  We cannot assume that just
> because a host agent is disconnected, that the VMs on that host are not
> running.
>
> This is where HOST HA comes in, this feature detects loss of connection to
> the agent and then tries to determine if the VMs on that host are active
> and then attempts some corrective action.
>
>
> Kind regards,
>
> Paul Angus
>
> paul.angus@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -----Original Message-----
> From: Sean Lair [mailto:slair@ippathways.com]
> Sent: 13 February 2018 23:06
> To: dev@cloudstack.apache.org
> Subject: System VMs not migrating when host down
>
> Hi all,
>
> We are testing VM HA and are having a problem with our system VMs
> (secondary storage and console) not being started up on another host when a
> host fails.
>
> Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an
> "Alert" agent state, but never migrate.  We are currently running 4.9.3.
>
>
> Thanks
> Sean
>



-- 

Andrija Panić

RE: System VMs not migrating when host down

Posted by Paul Angus <pa...@shapeblue.com>.
Hi Sean,

The 'problem' with VM HA in KVM is that it relies on the parent host agent to be connected to report that the VM is down.  We cannot assume that just because a host agent is disconnected, that the VMs on that host are not running. 

This is where HOST HA comes in, this feature detects loss of connection to the agent and then tries to determine if the VMs on that host are active and then attempts some corrective action.


Kind regards,

Paul Angus

paul.angus@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-----Original Message-----
From: Sean Lair [mailto:slair@ippathways.com] 
Sent: 13 February 2018 23:06
To: dev@cloudstack.apache.org
Subject: System VMs not migrating when host down

Hi all,

We are testing VM HA and are having a problem with our system VMs (secondary storage and console) not being started up on another host when a host fails.

Shouldn't the system VMs be VM HA-enabled?  Currently they are just in an "Alert" agent state, but never migrate.  We are currently running 4.9.3.


Thanks
Sean