You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Jon Marshall <jm...@hotmail.co.uk> on 2019/03/04 11:11:35 UTC

KVM Host HA and power lost to host.

I have KVM Host HA enabled and power is lost to one of the compute nodes.   The host has it's state marked as alert and the HA states go through degraded to suspect to Fencing.

The problem is that the host is never fenced because there is no power to it so none of the OOBM commands work which means the VMs are never migrated.

 From the management server logs -

2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask] (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running FenceTask on a resource: org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not configured or enabled for this host dcp-cscn2.local
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not configured or enabled for this host dcp-cscn2.local
        at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
        at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
        at org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) failed with error: Get Auth Capabilities error
Error issuing Get Channel Authentication Capabilities request
Error: Unable to establish IPMI v2 / RMCP+ session

        at org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
        at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        ... 21 more


which begs the question how is this meant to work for a host whose power has failed.


If I turn off KVM Host HA and change the ping interval to 30 and ping timeout to 2 then the VMs failover to another host within 5 mins.

I understand what Host HA is meant for but it seems for a failed host in terms of power it doesn't work.

Jon

Re: KVM Host HA and power lost to host.

Posted by Jon Marshall <jm...@hotmail.co.uk>.
Hi Andrija

Thanks for responding.

Makes sense I guess, just wanted to make sure I wasn't missing anything obvious.


Jon

________________________________
From: Andrija Panic <an...@gmail.com>
Sent: 04 March 2019 13:43
To: users
Subject: Re: KVM Host HA and power lost to host.

Jon,

not an expert on particular implementation, but obviously your host needs
power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host
fenced. Redundant PSU with different power sources is expected (defacto
standard in production).

Kind regards,
Andrija

On Mon, 4 Mar 2019 at 12:19, Jon Marshall <jm...@hotmail.co.uk> wrote:

>
> I have KVM Host HA enabled and power is lost to one of the compute nodes.
>  The host has it's state marked as alert and the HA states go through
> degraded to suspect to Fencing.
>
> The problem is that the host is never fenced because there is no power to
> it so none of the OOBM commands work which means the VMs are never migrated.
>
>  From the management server logs -
>
> 2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask]
> (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running
> FenceTask on a resource:
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
>         at
> org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band
> Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f)
> failed with error: Get Auth Capabilities error
> Error issuing Get Channel Authentication Capabilities request
> Error: Unable to establish IPMI v2 / RMCP+ session
>
>         at
> org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
>         at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         ... 21 more
>
>
> which begs the question how is this meant to work for a host whose power
> has failed.
>
>
> If I turn off KVM Host HA and change the ping interval to 30 and ping
> timeout to 2 then the VMs failover to another host within 5 mins.
>
> I understand what Host HA is meant for but it seems for a failed host in
> terms of power it doesn't work.
>
> Jon
>


--

Andrija Panić

Re: KVM Host HA and power lost to host.

Posted by Andrija Panic <an...@gmail.com>.
Jon,

not an expert on particular implementation, but obviously your host needs
power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host
fenced. Redundant PSU with different power sources is expected (defacto
standard in production).

Kind regards,
Andrija

On Mon, 4 Mar 2019 at 12:19, Jon Marshall <jm...@hotmail.co.uk> wrote:

>
> I have KVM Host HA enabled and power is lost to one of the compute nodes.
>  The host has it's state marked as alert and the HA states go through
> degraded to suspect to Fencing.
>
> The problem is that the host is never fenced because there is no power to
> it so none of the OOBM commands work which means the VMs are never migrated.
>
>  From the management server logs -
>
> 2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask]
> (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running
> FenceTask on a resource:
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
> org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not
> configured or enabled for this host dcp-cscn2.local
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
>         at
> org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
>         at
> org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
>         at
> org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band
> Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f)
> failed with error: Get Auth Capabilities error
> Error issuing Get Channel Authentication Capabilities request
> Error: Unable to establish IPMI v2 / RMCP+ session
>
>         at
> org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
>         at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         ... 21 more
>
>
> which begs the question how is this meant to work for a host whose power
> has failed.
>
>
> If I turn off KVM Host HA and change the ping interval to 30 and ping
> timeout to 2 then the VMs failover to another host within 5 mins.
>
> I understand what Host HA is meant for but it seems for a failed host in
> terms of power it doesn't work.
>
> Jon
>


-- 

Andrija Panić