You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sanjeev N (JIRA)" <ji...@apache.org> on 2013/12/23 14:05:53 UTC

[jira] [Created] (CLOUDSTACK-5610) [Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails

Sanjeev N created CLOUDSTACK-5610:
-------------------------------------

             Summary: [Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails
                 Key: CLOUDSTACK-5610
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5610
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Hypervisor Controller, Management Server
    Affects Versions: 4.3.0
         Environment: Latest build from 4.3 with commit :d462db4ae5c30e677d5810111f9ea5ca6812bce2
Storage: SMB for both primary and secondary
Hypervisor: Hyper-v
            Reporter: Sanjeev N
            Priority: Blocker
             Fix For: 4.3.0


[Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails

Steps to Reproduce:
=================
1.Bring up CS in advanced zone with with 2 or more Hyper-v hosts using SMB for both primary and secondary
2.Enable the zone and deploy few vms. Make sure that vms are distributed across all the hosts
3.Power off one of the hosts(Power off the hosts where vms are running)

Expected Result:
==============
Host should go into Alert state and all the vms running on it should be stopped

Actual Result:
============
Host remains in Up state and all the vms state show as running.

I could see the ping commands to Hypervsior aget, system vm agents in the MS log. Even though the agents are behind ping, agent status remains in UP state.

At this state , I have tried to deploy a vm and deployment planner chose the host which was powered off . Hence the vm deployment failed.
Also CPVM was running on the powered off host. That also remained in running state. Since cpvm agent is not reachable from CS it should have been stopped and started on another Host in the cluster.

2013-12-23 18:19:25,334 ERROR [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-331:ctx-831c60e9) org.apache.http.conn.HttpHostConnectException: Connection to http://10.147.40.31:8250 refused
2013-12-23 18:19:25,334 INFO  [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-331:ctx-831c60e9) Cannot ping host 10.147.40.31 (IP 10.147.40.31), pingAns (blank means null) is:com.cloud.agent.api.UnsupportedAnswer
2013-12-23 18:19:25,334 WARN  [c.c.a.m.DirectAgentAttache] (DirectAgent-331:ctx-831c60e9) Unable to get current status on 5(10.147.40.31)
2013-12-23 18:19:25,336 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) Investigating why host 5 has disconnected with event AgentDisconnected
2013-12-23 18:19:25,336 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Sending  { Cmd , MgmtId: 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Executing:  { Cmd , MgmtId: 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
2013-12-23 18:19:25,339 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-325:ctx-39f5ed39) Seq 5-1482556239: Executing request
2013-12-23 18:19:25,339 DEBUG [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-325:ctx-39f5ed39) POST request tohttp://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand with contents{"contextMap":{},"wait":50}
2013-12-23 18:19:25,340 DEBUG [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-325:ctx-39f5ed39) Sending cmd to http://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand cmd data:{"contextMap":{},"wait":50}

2013-12-23 18:19:46,345 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
2013-12-23 18:19:46,347 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) sending ping from (1) to agent's host ip address (10.147.40.31)
2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Sending  { Cmd , MgmtId: 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}] }
2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Executing:  { Cmd , MgmtId: 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}] }
2013-12-23 18:19:46,350 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Executing request
2013-12-23 18:19:46,350 INFO  [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-353:ctx-a48feb80) Executing resource PingTestCommand: {"_computingHostIp":"10.147.40.31","contextMap":{},"wait":20}
2013-12-23 18:19:46,351 ERROR [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-353:ctx-a48feb80) Unable to execute ping command on DomR (null), domR may not be ready yet. failure due to There was a problem while connecting to null:3922
2013-12-23 18:19:46,351 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Response Received:
2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Processing:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"PingTestCommand failed","wait":0}}] }
2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Received:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, { Answer } }
2013-12-23 18:19:46,351 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-16:ctx-be3804c7) host (10.147.40.31) cannot be pinged, returning null ('I don't know')
2013-12-23 18:19:46,351 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) could not reach agent, could not reach agent's host, returning that we don't have enough information
2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) PingInvestigator unable to determine the state of the host.  Moving on.
2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) ManagementIPSysVMInvestigator unable to determine the state of the host.  Moving on.
2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) KVMInvestigator unable to determine the state of the host.  Moving on.
2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) VMwareInvestigator unable to determine the state of the host.  Moving on.
2013-12-23 18:19:46,351 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) Agent state cannot be determined, do nothing

Attaching MS log and cloud DB.

Agent 5 is the host which was powered off.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)