You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Andrija Panic <an...@gmail.com> on 2019/03/04 13:43:35 UTC

Re: KVM Host HA and power lost to host.

Jon,

not an expert on particular implementation, but obviously your host needs
power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host
fenced. Redundant PSU with different power sources is expected (defacto
standard in production).

Kind regards,
Andrija

On Mon, 4 Mar 2019 at 12:19, Jon Marshall <jm...@hotmail.co.uk> wrote:

>
> I have KVM Host HA enabled and power is lost to one of the compute nodes.
>  The host has it's state marked as alert and the HA states go through
> degraded to suspect to Fencing.
>
> The problem is that the host is never fenced because there is no power to
> it so none of the OOBM commands work which means the VMs are never migrated.
>
>  From the management server logs -
>
> 2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask]
> (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running
> FenceTask on a resource:
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
>         at
> org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band
> Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f)
> failed with error: Get Auth Capabilities error
> Error issuing Get Channel Authentication Capabilities request
> Error: Unable to establish IPMI v2 / RMCP+ session
>
>         at
> org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
>         at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         ... 21 more
>
>
> which begs the question how is this meant to work for a host whose power
> has failed.
>
>
> If I turn off KVM Host HA and change the ping interval to 30 and ping
> timeout to 2 then the VMs failover to another host within 5 mins.
>
> I understand what Host HA is meant for but it seems for a failed host in
> terms of power it doesn't work.
>
> Jon
>


-- 

Andrija Panić

Re: KVM Host HA and power lost to host.

Posted by Jon Marshall <jm...@hotmail.co.uk>.
Hi Andrija

Thanks for responding.

Makes sense I guess, just wanted to make sure I wasn't missing anything obvious.


Jon

________________________________
From: Andrija Panic <an...@gmail.com>
Sent: 04 March 2019 13:43
To: users
Subject: Re: KVM Host HA and power lost to host.

Jon,

not an expert on particular implementation, but obviously your host needs
power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host
fenced. Redundant PSU with different power sources is expected (defacto
standard in production).

Kind regards,
Andrija

On Mon, 4 Mar 2019 at 12:19, Jon Marshall <jm...@hotmail.co.uk> wrote:

>
> I have KVM Host HA enabled and power is lost to one of the compute nodes.
>  The host has it's state marked as alert and the HA states go through
> degraded to suspect to Fencing.
>
> The problem is that the host is never fenced because there is no power to
> it so none of the OOBM commands work which means the VMs are never migrated.
>
>  From the management server logs -
>
> 2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask]
> (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running
> FenceTask on a resource:
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
>         at
> org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band
> Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f)
> failed with error: Get Auth Capabilities error
> Error issuing Get Channel Authentication Capabilities request
> Error: Unable to establish IPMI v2 / RMCP+ session
>
>         at
> org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
>         at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         ... 21 more
>
>
> which begs the question how is this meant to work for a host whose power
> has failed.
>
>
> If I turn off KVM Host HA and change the ping interval to 30 and ping
> timeout to 2 then the VMs failover to another host within 5 mins.
>
> I understand what Host HA is meant for but it seems for a failed host in
> terms of power it doesn't work.
>
> Jon
>


--

Andrija Panić