You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by li jerry <di...@hotmail.com> on 2019/06/06 14:50:17 UTC

Host's HA failed!

Hell All

We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch.
The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS.
We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack:
indirect.agent.lb.algorithm=roundrobin
host=M1IP,M2IP,M3IP

We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster.
M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3.

The problem is:
If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure.
If H1 is connected to other management VMs like M2 or M3, this issue won’t happen.

So, would you please give us some suggestions? Thanks.


Re: Host's HA failed!

Posted by Nicolas Vazquez <Ni...@shapeblue.com>.
Hi Li,

Can you post the stack trace of the faiure from the logs?


Regards,

Nicolas Vazquez

________________________________
From: li jerry <di...@hotmail.com>
Sent: Thursday, June 6, 2019 11:50 AM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Host's HA failed!

Hell All

We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch.
The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS.
We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack:
indirect.agent.lb.algorithm=roundrobin
host=M1IP,M2IP,M3IP

We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster.
M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3.

The problem is:
If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure.
If H1 is connected to other management VMs like M2 or M3, this issue won’t happen.

So, would you please give us some suggestions? Thanks.


Nicolas.Vazquez@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


Re: Host's HA failed!

Posted by Nicolas Vazquez <Ni...@shapeblue.com>.
Hi Li,

Can you post the stack trace of the faiure from the logs?


Regards,

Nicolas Vazquez

________________________________
From: li jerry <di...@hotmail.com>
Sent: Thursday, June 6, 2019 11:50 AM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Host's HA failed!

Hell All

We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch.
The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS.
We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack:
indirect.agent.lb.algorithm=roundrobin
host=M1IP,M2IP,M3IP

We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster.
M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3.

The problem is:
If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure.
If H1 is connected to other management VMs like M2 or M3, this issue won’t happen.

So, would you please give us some suggestions? Thanks.


Nicolas.Vazquez@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue