You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sanjeev N (JIRA)" <ji...@apache.org> on 2014/01/07 14:11:53 UTC

[jira] [Reopened] (CLOUDSTACK-5610) [Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjeev N reopened CLOUDSTACK-5610:
-----------------------------------


This fix does not work if there are two hosts in the cluster and one host is in Maintenance mode.
I have tried following scenario and cs was unable to determine the status of the disconnected host.

1.Bring up CS with two hyper-v hosts in a cluster
2.Put one host (say host1) in maintenance mode
3.Simulate network disconnect by unplugging the network cable on the other host(say host2)
4.CS is unable to determine the state of host2 and it remains in UP state .

Following is the log snippet from MS log:

2014-01-07 18:29:09,003 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-11:ctx-05078172) host (10.147.40.14) cannot be pinged, returning null ('I don't know')
2014-01-07 18:29:09,003 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-11:ctx-05078172) could not reach agent, could not reach agent's host, returning that we don't have enough information
2014-01-07 18:29:09,003 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-05078172) PingInvestigator unable to determine the state of the host.  Moving on.
2014-01-07 18:29:09,003 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-05078172) ManagementIPSysVMInvestigator unable to determine the state of the host.  Moving on.
2014-01-07 18:29:09,003 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-05078172) KVMInvestigator unable to determine the state of the host.  Moving on.
2014-01-07 18:29:09,006 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-05078172) Resource [Host:4] is unreachable: Host 4: Unable to send class com.cloud.agent.api.CheckOnHostCommand because agent 10.147.40.31 is in maintenance mode
2014-01-07 18:29:09,006 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-05078172) HypervInvestigator unable to determine the state of the host.  Moving on.
2014-01-07 18:29:09,006 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-05078172) VMwareInvestigator unable to determine the state of the host.  Moving on.
2014-01-07 18:29:09,006 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-05078172) Agent state cannot be determined, do nothing
2014-01-07 18:29:11,806 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-6:ctx-268ddd2e) Seq 6-210567216: Waiting some more time because this is the current command
2014-01-07 18:29:11,807 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-2:ctx-d7b3d598) Seq 7-327352402: Waiting some more time because this is the current command
2014-01-07 18:29:14,040 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consoleproxy-1:ctx-60162ca0) Zone 1 is ready to launch console proxy
2014-01-07 18:29:14,179 DEBUG [c.c.s.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-e954fb52) Zone 1 is ready to launch secondary storage VM
2014-01-07 18:29:19,874 ERROR [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-331:ctx-35463264) org.apache.http.conn.HttpHostConnectException: Connection to http://10.147.40.14:8250 refused
2014-01-07 18:29:19,874 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-331:ctx-35463264) Seq 1-820904008: Response Received:
2014-01-07 18:29:19,874 DEBUG [c.c.a.t.Request] (DirectAgent-331:ctx-35463264) Seq 1-820904008: Processing:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":false,"details":"Unsupported command issued:com.cloud.agent.api.GetHostStatsCommand.  Are you sure you got the right type of server?","wait":0}}] }
2014-01-07 18:29:19,874 DEBUG [c.c.a.t.Request] (StatsCollector-3:ctx-5e12f369) Seq 1-820904008: Received:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, { UnsupportedAnswer } }
2014-01-07 18:29:19,874 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-3:ctx-5e12f369) Unsupported Command: Unsupported command issued:com.cloud.agent.api.GetHostStatsCommand.  Are you sure you got the right type of server?
2014-01-07 18:29:19,875 DEBUG [c.c.a.m.AgentManagerImpl] (StatsCollector-3:ctx-5e12f369) Details from executing class com.cloud.agent.api.GetHostStatsCommand: Unsupported command issued:com.cloud.agent.api.GetHostStatsCommand.  Are you sure you got the right type of server?
2014-01-07 18:29:19,875 WARN  [c.c.s.StatsCollector] (StatsCollector-3:ctx-5e12f369) Received invalid host stats for host: 1


> [Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails
> -------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5610
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5610
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller, Management Server
>    Affects Versions: 4.3.0
>         Environment: Latest build from 4.3 with commit :d462db4ae5c30e677d5810111f9ea5ca6812bce2
> Storage: SMB for both primary and secondary
> Hypervisor: Hyper-v
>            Reporter: Sanjeev N
>            Assignee: Devdeep Singh
>            Priority: Blocker
>              Labels: hyper-V,
>             Fix For: 4.3.0
>
>         Attachments: cloud.dmp, management-server.rar
>
>
> [Hyper-v] Host does not go into Alert state even though it is power-off hence vm deployment fails
> Steps to Reproduce:
> =================
> 1.Bring up CS in advanced zone with with 2 or more Hyper-v hosts using SMB for both primary and secondary
> 2.Enable the zone and deploy few vms. Make sure that vms are distributed across all the hosts
> 3.Power off one of the hosts(Power off the hosts where vms are running)
> Expected Result:
> ==============
> Host should go into Alert state and all the vms running on it should be stopped
> Actual Result:
> ============
> Host remains in Up state and all the vms state show as running.
> I could see the ping commands to Hypervsior aget, system vm agents in the MS log. Even though the agents are behind ping, agent status remains in UP state.
> At this state , I have tried to deploy a vm and deployment planner chose the host which was powered off . Hence the vm deployment failed.
> Also CPVM was running on the powered off host. That also remained in running state. Since cpvm agent is not reachable from CS it should have been stopped and started on another Host in the cluster.
> 2013-12-23 18:19:25,334 ERROR [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-331:ctx-831c60e9) org.apache.http.conn.HttpHostConnectException: Connection to http://10.147.40.31:8250 refused
> 2013-12-23 18:19:25,334 INFO  [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-331:ctx-831c60e9) Cannot ping host 10.147.40.31 (IP 10.147.40.31), pingAns (blank means null) is:com.cloud.agent.api.UnsupportedAnswer
> 2013-12-23 18:19:25,334 WARN  [c.c.a.m.DirectAgentAttache] (DirectAgent-331:ctx-831c60e9) Unable to get current status on 5(10.147.40.31)
> 2013-12-23 18:19:25,336 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) Investigating why host 5 has disconnected with event AgentDisconnected
> 2013-12-23 18:19:25,336 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Sending  { Cmd , MgmtId: 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Executing:  { Cmd , MgmtId: 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-325:ctx-39f5ed39) Seq 5-1482556239: Executing request
> 2013-12-23 18:19:25,339 DEBUG [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-325:ctx-39f5ed39) POST request tohttp://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand with contents{"contextMap":{},"wait":50}
> 2013-12-23 18:19:25,340 DEBUG [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-325:ctx-39f5ed39) Sending cmd to http://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand cmd data:{"contextMap":{},"wait":50}
> 2013-12-23 18:19:46,345 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
> 2013-12-23 18:19:46,347 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) sending ping from (1) to agent's host ip address (10.147.40.31)
> 2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Sending  { Cmd , MgmtId: 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}] }
> 2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Executing:  { Cmd , MgmtId: 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}] }
> 2013-12-23 18:19:46,350 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Executing request
> 2013-12-23 18:19:46,350 INFO  [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-353:ctx-a48feb80) Executing resource PingTestCommand: {"_computingHostIp":"10.147.40.31","contextMap":{},"wait":20}
> 2013-12-23 18:19:46,351 ERROR [c.c.h.h.r.HypervDirectConnectResource] (DirectAgent-353:ctx-a48feb80) Unable to execute ping command on DomR (null), domR may not be ready yet. failure due to There was a problem while connecting to null:3922
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Response Received:
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request] (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Processing:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"PingTestCommand failed","wait":0}}] }
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request] (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Received:  { Ans: , MgmtId: 132129494109518, via: 1, Ver: v1, Flags: 10, { Answer } }
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-16:ctx-be3804c7) host (10.147.40.31) cannot be pinged, returning null ('I don't know')
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-16:ctx-be3804c7) could not reach agent, could not reach agent's host, returning that we don't have enough information
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) PingInvestigator unable to determine the state of the host.  Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) ManagementIPSysVMInvestigator unable to determine the state of the host.  Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) KVMInvestigator unable to determine the state of the host.  Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-16:ctx-be3804c7) VMwareInvestigator unable to determine the state of the host.  Moving on.
> 2013-12-23 18:19:46,351 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-16:ctx-be3804c7) Agent state cannot be determined, do nothing
> Attaching MS log and cloud DB.
> Agent 5 is the host which was powered off.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)