You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by li jerry <di...@hotmail.com> on 2019/06/22 04:19:40 UTC

KVM HA fails under multiple management services

Hello everyone
I recently tested the multiple management services, based on agent lb HOST HA (KVM). It was found that in extreme cases, HA would fail; the details are as follows:


Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an external database cluster
Three KVM nodes, H1, H2, H3
An external NFS primary storage


CLOUDSTACK parameter configuration
Indirect.agent.lb.algorithm=static
Indirect.agent.lb.check.interval=0
host=172.17.1.141,172.17.1.142


Through the agent.log analysis, all kvm agents are connected to the first selection management node M1 (172.17.1.141):

INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) Processed new management server list: 172.17.1.141,172.17.1.142@static



In extreme cases:
KVM HOST and the preferred management server fail at the same time, KVM HOST will not trigger HA detection

E.g:

M1+H1, power off at the same time; the state of H1 remains Disconnected, and all VMs on H1 will not restart on other KVM nodes;
M1+H2, power off at the same time; the state of H1 remains Disconnected, and all VMs on H2 will not restart on other KVM nodes;
M1+H3, power off at the same time; the state of H1 remains Disconnected, and all VMs on H3 will not restart on other KVM nodes;

Re: KVM HA fails under multiple management services

Posted by Andrija Panic <an...@gmail.com>.
Li,

please test with  Indirect.agent.lb.check.interval=60 or similar, not 0
(zero), since that means it won't reconnect - this should solve your
concern.

As for the what is in what rack, it is your responsibility to disperse
infrastructure components appropriately, i.e. across racks and such.
We can't handle every case in that regards, hope you understand

Andrija

On Mon, 24 Jun 2019 at 02:24, li jerry <di...@hotmail.com> wrote:

> Thank you Nicolas and Andrija.
>
> Even if indirect.agent.lb.algorithm is configured as roundrobin, the
> probability of failure can only be reduced. But it does not solve 100% of
> the failure of KVM HA;
>
> Because in extreme cases, the management server and the kvm host may fail
> at the same time (for example, the management server and the KVM HOST are
> placed in the same rack, and the RACK will fail at the same time after the
> power failure)
>
>
> E.g;
>
> H1 is assigned and connected to M2
> H2 is assigned and connected to M3
> H3 is assigned and connected to M1
>
> When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;
>
> Should we have other protection mechanisms to avoid this?
>
> 发件人: Nicolas Vazquez<ma...@shapeblue.com>
> 发送时间: 2019年6月23日 23:31
> 收件人: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>;
> users<ma...@cloudstack.apache.org>
> 抄送: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>
> 主题: Re: KVM HA fails under multiple management services
>
> As Andrija mentioned that is expected behavior as the global setting is
> 'static'. It is also expected that your agents connect to the next
> management server on the 'host' list once the management server they are
> connected to is down.
> You can find more information of this feature on this link:
> https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/
>
> Please note this is a different feature than host HA, in which CloudStack
> will try to recover hosts which are off via ipmi
>
> Obtener Outlook para Android<https://aka.ms/ghei36>
>
>
>
> De: Andrija Panic
> Enviado: domingo, 23 de junio 11:03
> Asunto: Re: KVM HA fails under multiple management services
> Para: users
> Cc: dev@cloudstack.apache.org
>
>
> Li,
>
> based on the Global Setting description for those 2, I would say that is
> the expected behaviour.
> i.e. change Indirect.agent.lb.check.interval to some other value, since 0
> means "don't check, don't reconnect" per what I read.
>
> Also, you might want to change from  Indirect.agent.lb.algorithm=static to
> some other value, since static means all your KVM agents will always
> connect to that one mgmt host that is the first one in the in the "host"
> list.
>
> Regards,
> Andrija
>
>
> Nicolas.Vazquez@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
> On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:
>
> >
> > Hello everyone
> > I recently tested the multiple management services, based on agent lb
> HOST
> > HA (KVM). It was found that in extreme cases, HA would fail; the details
> > are as follows:
> >
> >
> > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> > external database cluster
> > Three KVM nodes, H1, H2, H3
> > An external NFS primary storage
> >
> >
> > CLOUDSTACK parameter configuration
> > Indirect.agent.lb.algorithm=static
> > Indirect.agent.lb.check.interval=0
> > host=172.17.1.141,172.17.1.142
> >
> >
> > Through the agent.log analysis, all kvm agents are connected to the first
> > selection management node M1 (172.17.1.141):
> >
> > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> > Processed new management server list: 172.17.1.141,172.17.1.142@static
> >
> >
> >
> > In extreme cases:
> > KVM HOST and the preferred management server fail at the same time, KVM
> > HOST will not trigger HA detection
> >
> > E.g:
> >
> > M1+H1, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H1 will not restart on other KVM nodes;
> > M1+H2, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H2 will not restart on other KVM nodes;
> > M1+H3, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H3 will not restart on other KVM nodes;
> >
>
>
> --
>
> Andrija Panić
>
>
>

-- 

Andrija Panić

Re: KVM HA fails under multiple management services

Posted by Andrija Panic <an...@gmail.com>.
Li,

please test with  Indirect.agent.lb.check.interval=60 or similar, not 0
(zero), since that means it won't reconnect - this should solve your
concern.

As for the what is in what rack, it is your responsibility to disperse
infrastructure components appropriately, i.e. across racks and such.
We can't handle every case in that regards, hope you understand

Andrija

On Mon, 24 Jun 2019 at 02:24, li jerry <di...@hotmail.com> wrote:

> Thank you Nicolas and Andrija.
>
> Even if indirect.agent.lb.algorithm is configured as roundrobin, the
> probability of failure can only be reduced. But it does not solve 100% of
> the failure of KVM HA;
>
> Because in extreme cases, the management server and the kvm host may fail
> at the same time (for example, the management server and the KVM HOST are
> placed in the same rack, and the RACK will fail at the same time after the
> power failure)
>
>
> E.g;
>
> H1 is assigned and connected to M2
> H2 is assigned and connected to M3
> H3 is assigned and connected to M1
>
> When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;
>
> Should we have other protection mechanisms to avoid this?
>
> 发件人: Nicolas Vazquez<ma...@shapeblue.com>
> 发送时间: 2019年6月23日 23:31
> 收件人: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>;
> users<ma...@cloudstack.apache.org>
> 抄送: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>
> 主题: Re: KVM HA fails under multiple management services
>
> As Andrija mentioned that is expected behavior as the global setting is
> 'static'. It is also expected that your agents connect to the next
> management server on the 'host' list once the management server they are
> connected to is down.
> You can find more information of this feature on this link:
> https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/
>
> Please note this is a different feature than host HA, in which CloudStack
> will try to recover hosts which are off via ipmi
>
> Obtener Outlook para Android<https://aka.ms/ghei36>
>
>
>
> De: Andrija Panic
> Enviado: domingo, 23 de junio 11:03
> Asunto: Re: KVM HA fails under multiple management services
> Para: users
> Cc: dev@cloudstack.apache.org
>
>
> Li,
>
> based on the Global Setting description for those 2, I would say that is
> the expected behaviour.
> i.e. change Indirect.agent.lb.check.interval to some other value, since 0
> means "don't check, don't reconnect" per what I read.
>
> Also, you might want to change from  Indirect.agent.lb.algorithm=static to
> some other value, since static means all your KVM agents will always
> connect to that one mgmt host that is the first one in the in the "host"
> list.
>
> Regards,
> Andrija
>
>
> Nicolas.Vazquez@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
> On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:
>
> >
> > Hello everyone
> > I recently tested the multiple management services, based on agent lb
> HOST
> > HA (KVM). It was found that in extreme cases, HA would fail; the details
> > are as follows:
> >
> >
> > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> > external database cluster
> > Three KVM nodes, H1, H2, H3
> > An external NFS primary storage
> >
> >
> > CLOUDSTACK parameter configuration
> > Indirect.agent.lb.algorithm=static
> > Indirect.agent.lb.check.interval=0
> > host=172.17.1.141,172.17.1.142
> >
> >
> > Through the agent.log analysis, all kvm agents are connected to the first
> > selection management node M1 (172.17.1.141):
> >
> > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> > Processed new management server list: 172.17.1.141,172.17.1.142@static
> >
> >
> >
> > In extreme cases:
> > KVM HOST and the preferred management server fail at the same time, KVM
> > HOST will not trigger HA detection
> >
> > E.g:
> >
> > M1+H1, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H1 will not restart on other KVM nodes;
> > M1+H2, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H2 will not restart on other KVM nodes;
> > M1+H3, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H3 will not restart on other KVM nodes;
> >
>
>
> --
>
> Andrija Panić
>
>
>

-- 

Andrija Panić

KVM HA fails under multiple management services

Posted by li jerry <di...@hotmail.com>.
Thank you Nicolas and Andrija.

Even if indirect.agent.lb.algorithm is configured as roundrobin, the probability of failure can only be reduced. But it does not solve 100% of the failure of KVM HA;

Because in extreme cases, the management server and the kvm host may fail at the same time (for example, the management server and the KVM HOST are placed in the same rack, and the RACK will fail at the same time after the power failure)


E.g;

H1 is assigned and connected to M2
H2 is assigned and connected to M3
H3 is assigned and connected to M1

When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;

Should we have other protection mechanisms to avoid this?

发件人: Nicolas Vazquez<ma...@shapeblue.com>
发送时间: 2019年6月23日 23:31
收件人: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>; users<ma...@cloudstack.apache.org>
抄送: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>
主题: Re: KVM HA fails under multiple management services

As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down.
You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: dev@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


Nicolas.Vazquez@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue



On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić



KVM HA fails under multiple management services

Posted by li jerry <di...@hotmail.com>.
Thank you Nicolas and Andrija.

Even if indirect.agent.lb.algorithm is configured as roundrobin, the probability of failure can only be reduced. But it does not solve 100% of the failure of KVM HA;

Because in extreme cases, the management server and the kvm host may fail at the same time (for example, the management server and the KVM HOST are placed in the same rack, and the RACK will fail at the same time after the power failure)


E.g;

H1 is assigned and connected to M2
H2 is assigned and connected to M3
H3 is assigned and connected to M1

When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;

Should we have other protection mechanisms to avoid this?

发件人: Nicolas Vazquez<ma...@shapeblue.com>
发送时间: 2019年6月23日 23:31
收件人: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>; users<ma...@cloudstack.apache.org>
抄送: dev@cloudstack.apache.org<ma...@cloudstack.apache.org>
主题: Re: KVM HA fails under multiple management services

As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down.
You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: dev@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


Nicolas.Vazquez@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue



On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić



Re: KVM HA fails under multiple management services

Posted by Nicolas Vazquez <Ni...@shapeblue.com>.
As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down.
You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: dev@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


Nicolas.Vazquez@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić



Re: KVM HA fails under multiple management services

Posted by Nicolas Vazquez <Ni...@shapeblue.com>.
As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down.
You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: dev@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


Nicolas.Vazquez@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić



Re: KVM HA fails under multiple management services

Posted by Andrija Panic <an...@gmail.com>.
Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija

On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


-- 

Andrija Panić

Re: KVM HA fails under multiple management services

Posted by Andrija Panic <an...@gmail.com>.
Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija

On Sat, 22 Jun 2019 at 06:19, li jerry <di...@hotmail.com> wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


-- 

Andrija Panić