You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Bryan Tiang <br...@hotmail.com> on 2023/10/10 10:35:30 UTC

Cloudstack VM HA

Hi All,

We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is Ubuntu.

We are trying to test the VM HA by powering down a physical node at random. However, the VMs doesn’t seem to be failing over to the other nodes.

VM HA is enabled already, is there something we are missing?

Regards,
Bryan

Re: Cloudstack VM HA

Posted by Joan g <jo...@gmail.com>.
Hi Nux,

My deployment is using KVM on centos 7 and NFS as primary storage. Even
after enabling HA HA State is showing as "Ineligible" on all 3 KVM hosts.
 Did I miss something?

Reg,
Jon

On Tue, 10 Oct, 2023, 19:06 Nux, <nu...@li.nux.ro> wrote:

> Hello,
>
> You need a stable NFS primary storage for the hearbeat file.
> You can keep it in disabled state after the testing - so VMs do not get
> created there - but it needs to be present.
> Watch out, if the NFS storage becomes unstable or unreachable via
> network (switch fault etc) the hypervisors will force reboot themselves.
>
>
> On 2023-10-10 11:35, Bryan Tiang wrote:
> > Hi All,
> >
> > We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is
> > Ubuntu.
> >
> > We are trying to test the VM HA by powering down a physical node at
> > random. However, the VMs doesn’t seem to be failing over to the other
> > nodes.
> >
> > VM HA is enabled already, is there something we are missing?
> >
> > Regards,
> > Bryan
>

Re: Cloudstack VM HA

Posted by Nux <nu...@li.nux.ro>.
Hello,

You need a stable NFS primary storage for the hearbeat file.
You can keep it in disabled state after the testing - so VMs do not get 
created there - but it needs to be present.
Watch out, if the NFS storage becomes unstable or unreachable via 
network (switch fault etc) the hypervisors will force reboot themselves.


On 2023-10-10 11:35, Bryan Tiang wrote:
> Hi All,
> 
> We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is 
> Ubuntu.
> 
> We are trying to test the VM HA by powering down a physical node at 
> random. However, the VMs doesn’t seem to be failing over to the other 
> nodes.
> 
> VM HA is enabled already, is there something we are missing?
> 
> Regards,
> Bryan

Re: AW: AW: Cloudstack VM HA

Posted by st...@bienek.org.
Hello Bryan,

your understanding is correct to my knowledge and experience.
Host HA is restarting all affected VMs on a still running Host in case of a Host failure.

VM HA is restarting the VM in case the VM fails / is not running even though it should.
VM HA will even (re)-start the VM in case you shutdown the VM on OS level, as according to CS the VM should be in running state.

For the test case you described, shutting down a host and expecting all VMs to be restarted on a another host, you would need Host HA.

I cannot stress the remark of Nux enough - you need a really stable NFS for Host HA on KVM.
We learned it the hard way using Ceph with HA NFS Gateways, which, in our case, was not stable enough for Host HA resulting in all CloudStack hosts rebooting unexpectedly for example during Ceph Host reboots/updates.

I am very curious about to hear about altenatives to NFS based Host HA on CCC.

Best regards,
Stephan

> me@swen.io hat am 11.10.2023 16:23 CEST geschrieben:
> 
>  
> At the moment you need a nfs storage as nux wrote. Without it you are unable to use host ha.
> 
> As far as I understand you can use both ha options and host ha will start vms on another host if a host will be unavailable. But I am not 100% sure about this.
> 
> Regards,
> Swen
> 
> -----Ursprüngliche Nachricht-----
> Von: Bryan Tiang <br...@hotmail.com> 
> Gesendet: Mittwoch, 11. Oktober 2023 15:48
> An: users@cloudstack.apache.org
> Betreff: Re: AW: Cloudstack VM HA
> 
> Hi Nix and Swen,
> 
> Thanks for the input! Just curious, can VM HA and Host HA be enabled at the same time?
> 
> In our case, using Cloudstack + Linstor.
> 
> And to clarify my understanding. Host HA migrates VMs to another Host if Cloudstack detects the physical host to be unhealthy, right? That’s all?
> 
> Regards,
> Bryan
> On 11 Oct 2023 at 7:48 PM +0800, me@swen.io, wrote:
> > Hi Bryan,
> >
> > we are testing the exact same scenario at the moment! :-)
> >
> > As far as I understand CS has 2 different HA. VM HA and Host HA. When talking about VM HA the VM needs to use an offering with ha is enabled. CS is now checking if the VM is running and if it is not running it will restart or recreate it. You can test this when destroying a vm via virsh destroy on KVM directly. CS will restart this VM.
> >
> > Host HA only works, as NUX wrote, with NFS-storage at the moment. As far as I know StorPool is developing a new framework so other storages can be used for host ha in the future. I read something on the ccc agenda.
> >
> > Regards,
> > Swen
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Bryan Tiang <br...@hotmail.com>
> > Gesendet: Dienstag, 10. Oktober 2023 12:36
> > An: users@cloudstack.apache.org
> > Betreff: Cloudstack VM HA
> >
> > Hi All,
> >
> > We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is Ubuntu.
> >
> > We are trying to test the VM HA by powering down a physical node at random. However, the VMs doesn’t seem to be failing over to the other nodes.
> >
> > VM HA is enabled already, is there something we are missing?
> >
> > Regards,
> > Bryan
> >
> >

AW: AW: Cloudstack VM HA

Posted by me...@swen.io.
At the moment you need a nfs storage as nux wrote. Without it you are unable to use host ha.

As far as I understand you can use both ha options and host ha will start vms on another host if a host will be unavailable. But I am not 100% sure about this.

Regards,
Swen

-----Ursprüngliche Nachricht-----
Von: Bryan Tiang <br...@hotmail.com> 
Gesendet: Mittwoch, 11. Oktober 2023 15:48
An: users@cloudstack.apache.org
Betreff: Re: AW: Cloudstack VM HA

Hi Nix and Swen,

Thanks for the input! Just curious, can VM HA and Host HA be enabled at the same time?

In our case, using Cloudstack + Linstor.

And to clarify my understanding. Host HA migrates VMs to another Host if Cloudstack detects the physical host to be unhealthy, right? That’s all?

Regards,
Bryan
On 11 Oct 2023 at 7:48 PM +0800, me@swen.io, wrote:
> Hi Bryan,
>
> we are testing the exact same scenario at the moment! :-)
>
> As far as I understand CS has 2 different HA. VM HA and Host HA. When talking about VM HA the VM needs to use an offering with ha is enabled. CS is now checking if the VM is running and if it is not running it will restart or recreate it. You can test this when destroying a vm via virsh destroy on KVM directly. CS will restart this VM.
>
> Host HA only works, as NUX wrote, with NFS-storage at the moment. As far as I know StorPool is developing a new framework so other storages can be used for host ha in the future. I read something on the ccc agenda.
>
> Regards,
> Swen
>
> -----Ursprüngliche Nachricht-----
> Von: Bryan Tiang <br...@hotmail.com>
> Gesendet: Dienstag, 10. Oktober 2023 12:36
> An: users@cloudstack.apache.org
> Betreff: Cloudstack VM HA
>
> Hi All,
>
> We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is Ubuntu.
>
> We are trying to test the VM HA by powering down a physical node at random. However, the VMs doesn’t seem to be failing over to the other nodes.
>
> VM HA is enabled already, is there something we are missing?
>
> Regards,
> Bryan
>
>



Re: AW: Cloudstack VM HA

Posted by Jithin Raju <ji...@shapeblue.com>.
Sharing this wiki page here, the current implementation might have changed from this but still can be used as a reference:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer%27s+Guide


-Jithin

From: Nux <nu...@li.nux.ro>
Date: Wednesday, 11 October 2023 at 8:38 PM
To: users@cloudstack.apache.org <us...@cloudstack.apache.org>
Cc: Bryan Tiang <br...@hotmail.com>
Subject: Re: AW: Cloudstack VM HA
What I learned in practice is that enabling Host HA affects VM HA.. in
that VM HA no longer works. :)

So what does Host HA do? It'll reboot the hypervisor via IPMI if it is
deemed unreachable. While the hypervisor is down or rebooting the VMs
CANNOT be moved/started on another hypervisor.

What does VM HA do? It'll make sure VMs on a HA offering will be
restarted (possibly on another hypervisor) if it is deemed down.
Possible scenarios where VM HA would kick in:
- hypervisor crashed and Cloudstack marked the VMs on it as down
- user has powered off the VM from within (poweroff via ssh for
example), Cloudstack will notice it is down and restart it

As part of VM HA and for data integrity, a hypervisor will keep a
heartbeat file (sort of lock file) on the NFS primary storage - if the
the NFS share has gone away it will assume it is in a network split or
has lost access to the storage and will forcefully reboot itself. This
is where that happens:

https://github.com/apache/cloudstack/blob/d2ad9363a264290e9e5ee58db4a745cbb0e1c62a/scripts/vm/hypervisor/kvm/kvmheartbeat.sh#L162

HTH

On 2023-10-11 14:47, Bryan Tiang wrote:
> Hi Nix and Swen,
>
> Thanks for the input! Just curious, can VM HA and Host HA be enabled at
> the same time?
>
> In our case, using Cloudstack + Linstor.
>
> And to clarify my understanding. Host HA migrates VMs to another Host
> if Cloudstack detects the physical host to be unhealthy, right? That’s
> all?
>
> Regards,
> Bryan
> On 11 Oct 2023 at 7:48 PM +0800, me@swen.io, wrote:
>> Hi Bryan,
>>
>> we are testing the exact same scenario at the moment! :-)
>>
>> As far as I understand CS has 2 different HA. VM HA and Host HA. When
>> talking about VM HA the VM needs to use an offering with ha is
>> enabled. CS is now checking if the VM is running and if it is not
>> running it will restart or recreate it. You can test this when
>> destroying a vm via virsh destroy on KVM directly. CS will restart
>> this VM.
>>
>> Host HA only works, as NUX wrote, with NFS-storage at the moment. As
>> far as I know StorPool is developing a new framework so other storages
>> can be used for host ha in the future. I read something on the ccc
>> agenda.
>>
>> Regards,
>> Swen
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Bryan Tiang <br...@hotmail.com>
>> Gesendet: Dienstag, 10. Oktober 2023 12:36
>> An: users@cloudstack.apache.org
>> Betreff: Cloudstack VM HA
>>
>> Hi All,
>>
>> We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is
>> Ubuntu.
>>
>> We are trying to test the VM HA by powering down a physical node at
>> random. However, the VMs doesn’t seem to be failing over to the other
>> nodes.
>>
>> VM HA is enabled already, is there something we are missing?
>>
>> Regards,
>> Bryan
>>
>>

 


Re: AW: Cloudstack VM HA

Posted by Nux <nu...@li.nux.ro>.
What I learned in practice is that enabling Host HA affects VM HA.. in 
that VM HA no longer works. :)

So what does Host HA do? It'll reboot the hypervisor via IPMI if it is 
deemed unreachable. While the hypervisor is down or rebooting the VMs 
CANNOT be moved/started on another hypervisor.

What does VM HA do? It'll make sure VMs on a HA offering will be 
restarted (possibly on another hypervisor) if it is deemed down. 
Possible scenarios where VM HA would kick in:
- hypervisor crashed and Cloudstack marked the VMs on it as down
- user has powered off the VM from within (poweroff via ssh for 
example), Cloudstack will notice it is down and restart it

As part of VM HA and for data integrity, a hypervisor will keep a 
heartbeat file (sort of lock file) on the NFS primary storage - if the 
the NFS share has gone away it will assume it is in a network split or 
has lost access to the storage and will forcefully reboot itself. This 
is where that happens:

https://github.com/apache/cloudstack/blob/d2ad9363a264290e9e5ee58db4a745cbb0e1c62a/scripts/vm/hypervisor/kvm/kvmheartbeat.sh#L162

HTH

On 2023-10-11 14:47, Bryan Tiang wrote:
> Hi Nix and Swen,
> 
> Thanks for the input! Just curious, can VM HA and Host HA be enabled at 
> the same time?
> 
> In our case, using Cloudstack + Linstor.
> 
> And to clarify my understanding. Host HA migrates VMs to another Host 
> if Cloudstack detects the physical host to be unhealthy, right? That’s 
> all?
> 
> Regards,
> Bryan
> On 11 Oct 2023 at 7:48 PM +0800, me@swen.io, wrote:
>> Hi Bryan,
>> 
>> we are testing the exact same scenario at the moment! :-)
>> 
>> As far as I understand CS has 2 different HA. VM HA and Host HA. When 
>> talking about VM HA the VM needs to use an offering with ha is 
>> enabled. CS is now checking if the VM is running and if it is not 
>> running it will restart or recreate it. You can test this when 
>> destroying a vm via virsh destroy on KVM directly. CS will restart 
>> this VM.
>> 
>> Host HA only works, as NUX wrote, with NFS-storage at the moment. As 
>> far as I know StorPool is developing a new framework so other storages 
>> can be used for host ha in the future. I read something on the ccc 
>> agenda.
>> 
>> Regards,
>> Swen
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Bryan Tiang <br...@hotmail.com>
>> Gesendet: Dienstag, 10. Oktober 2023 12:36
>> An: users@cloudstack.apache.org
>> Betreff: Cloudstack VM HA
>> 
>> Hi All,
>> 
>> We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is 
>> Ubuntu.
>> 
>> We are trying to test the VM HA by powering down a physical node at 
>> random. However, the VMs doesn’t seem to be failing over to the other 
>> nodes.
>> 
>> VM HA is enabled already, is there something we are missing?
>> 
>> Regards,
>> Bryan
>> 
>> 

Re: AW: Cloudstack VM HA

Posted by Bryan Tiang <br...@hotmail.com>.
Hi Nix and Swen,

Thanks for the input! Just curious, can VM HA and Host HA be enabled at the same time?

In our case, using Cloudstack + Linstor.

And to clarify my understanding. Host HA migrates VMs to another Host if Cloudstack detects the physical host to be unhealthy, right? That’s all?

Regards,
Bryan
On 11 Oct 2023 at 7:48 PM +0800, me@swen.io, wrote:
> Hi Bryan,
>
> we are testing the exact same scenario at the moment! :-)
>
> As far as I understand CS has 2 different HA. VM HA and Host HA. When talking about VM HA the VM needs to use an offering with ha is enabled. CS is now checking if the VM is running and if it is not running it will restart or recreate it. You can test this when destroying a vm via virsh destroy on KVM directly. CS will restart this VM.
>
> Host HA only works, as NUX wrote, with NFS-storage at the moment. As far as I know StorPool is developing a new framework so other storages can be used for host ha in the future. I read something on the ccc agenda.
>
> Regards,
> Swen
>
> -----Ursprüngliche Nachricht-----
> Von: Bryan Tiang <br...@hotmail.com>
> Gesendet: Dienstag, 10. Oktober 2023 12:36
> An: users@cloudstack.apache.org
> Betreff: Cloudstack VM HA
>
> Hi All,
>
> We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is Ubuntu.
>
> We are trying to test the VM HA by powering down a physical node at random. However, the VMs doesn’t seem to be failing over to the other nodes.
>
> VM HA is enabled already, is there something we are missing?
>
> Regards,
> Bryan
>
>

AW: Cloudstack VM HA

Posted by me...@swen.io.
Hi Bryan,

we are testing the exact same scenario at the moment! :-)

As far as I understand CS has 2 different HA. VM HA and Host HA. When talking about VM HA the VM needs to use an offering with ha is enabled. CS is now checking if the VM is running and if it is not running it will restart or recreate it. You can test this when destroying a vm via virsh destroy on KVM directly. CS will restart this VM.

Host HA only works, as NUX wrote, with NFS-storage at the moment. As far as I know StorPool is developing a new framework so other storages can be used for host ha in the future. I read something on the ccc agenda.

Regards,
Swen

-----Ursprüngliche Nachricht-----
Von: Bryan Tiang <br...@hotmail.com> 
Gesendet: Dienstag, 10. Oktober 2023 12:36
An: users@cloudstack.apache.org
Betreff: Cloudstack VM HA

Hi All,

We are setting up Cloudstack + Linbit SDS (via plugin). Hypervisor is Ubuntu.

We are trying to test the VM HA by powering down a physical node at random. However, the VMs doesn’t seem to be failing over to the other nodes.

VM HA is enabled already, is there something we are missing?

Regards,
Bryan