You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Ivan Gladenko <we...@colobridge.net> on 2013/09/11 10:06:27 UTC

Bug in CS 4.2 Hight Avalibility under KVM

Hi,
after some test on 4.2 we have found some bug's in height availability 
mechanisms(2 Hosts) in 4.2 on KVM hypervisor (revision 2852) on centos6.4.
The host going down and KVMInvestigator mark the host as unavailable in 
right way but VM's are not going to start on another automaticaly in 
spite of the fact that all resources are available.

Logs:
"

2013-09-10 15:06:48,583 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Processing HAWork[28-HA-246-Running-Investigating]
2013-09-10 15:06:48,586 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) HA on VM[User|hahaha]
2013-09-10 15:06:48,588 DEBUG [cloud.ha.CheckOnAgentInvestigator]
(HA-Worker-0:work-28) Unable to reach the agent for VM[User|hahaha]:
Resource [Host:4] is unreachable: Host 4: Host with specified id is not in
the right state: Down
2013-09-10 15:06:48,588 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) SimpleInvestigator found VM[User|hahaha]to be alive?
null
2013-09-10 15:06:48,593 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) XenServerInvestigator found VM[User|hahaha]to be
alive? null
2013-09-10 15:06:48,593 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) testing if VM[User|hahaha] is alive
2013-09-10 15:06:48,598 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Host with id null doesn't exist
2013-09-10 15:06:48,598 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that
it is unknown
2013-09-10 15:06:48,599 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400199: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.PingTestCommand":{"_routerIp":"169.254.0.190","_privateIp":"10.10.10.17","wait":20}}]
}
2013-09-10 15:06:52,804 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400199: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { Answer } }
2013-09-10 15:06:52,804 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Details from executing class
com.cloud.agent.api.PingTestCommand: PING 10.10.10.17 (10.10.10.17): 56
data bytes64 bytes from 10.10.10.222: Destination Host UnreachableVr HL TOS
  Len   ID Flg  off TTL Pro  cks      Src      Dst Data 4  5  00 5400 0000
0 0040  40  01 a711 10.10.10.222  10.10.10.17 --- 10.10.10.17 ping
statistics ---1 packets transmitted, 0 packets received, 100% packet
lossUnable to ping the vm, exiting
2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that
it is unknown
2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) Returning null since we're unable to determine state
of VM[User|hahaha]
2013-09-10 15:06:52,804 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Not a System Vm, unable to determine state of
VM[User|hahaha] returning null
2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Testing if VM[User|hahaha] is alive
2013-09-10 15:06:52,808 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Unable to find a management nic, cannot ping this
system VM, unable to determine state of VM[User|hahaha] returning null
2013-09-10 15:06:52,808 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,812 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400206: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","privateNetwork":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false},"storageNetwork1":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false}},"wait":20}}]
}
2013-09-10 15:06:52,921 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400206: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { Answer } }
2013-09-10 15:06:52,921 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) KVMInvestigator found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,921 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencing off VM that we don't know the state of
2013-09-10 15:06:52,921 DEBUG [cloud.ha.XenServerFencer]
(HA-Worker-0:work-28) Don't know how to fence non XenServer hosts KVM
2013-09-10 15:06:52,921 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencer null returned null
2013-09-10 15:06:52,926 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400207: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.FenceCommand":{"vmName":"i-2-246-VM","hostGuid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","hostIp":"77.72.128.116","inSeq":false,"wait":0}}]
}
2013-09-10 15:06:53,038 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400207: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { FenceAnswer } }
2013-09-10 15:06:53,038 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencer KVMFenceBuilder returned true
2013-09-10 15:06:53,046 DEBUG [cloud.capacity.CapacityManagerImpl]
(HA-Worker-0:work-28) VM state transitted from :Running to Stopping with
event: StopRequestedvm's original host id: 4 new host id: 4 host id before
state transition: 4
2013-09-10 15:06:53,048 DEBUG [cloud.vm.UserVmManagerImpl]
(HA-Worker-0:work-28) Collect vm disk statistics from host before stopping
Vm
2013-09-10 15:06:53,052 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Can not send command
com.cloud.agent.api.GetVmDiskStatsCommand due to Host 4 is not up
2013-09-10 15:06:53,054 WARN  [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Unable to stop vm, agent unavailable:
com.cloud.exception.AgentUnavailableException: Resource [Host:4] is
unreachable: Host 4: Host with specified id is not in the right state: Down
2013-09-10 15:06:53,055 WARN  [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Unable to actually stop VM[User|hahaha] but continue
with release because it's a force stop
2013-09-10 15:06:53,058 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) VM[User|hahaha] is stopped on the host.  Proceeding
to release resource held.
2013-09-10 15:06:53,062 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,065 DEBUG [cloud.network.NetworkManagerImpl]
(HA-Worker-0:work-28) Changing active number of nics for network id=205 on
-1
2013-09-10 15:06:53,070 DEBUG [cloud.network.NetworkManagerImpl]
(HA-Worker-0:work-28) Asking VirtualRouter to release
Nic[942-246-9bc94718-8d0d-4463-83c4-7780cdfbe7d9-10.10.10.17]
2013-09-10 15:06:53,070 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Successfully released network resources for the vm
VM[User|hahaha]
2013-09-10 15:06:53,071 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Successfully released storage resources for the vm
VM[User|hahaha]
2013-09-10 15:06:53,084 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,088 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,096 DEBUG [cloud.capacity.CapacityManagerImpl]
(HA-Worker-0:work-28) VM state transitted from :Stopping to Stopped with
event: OperationSucceededvm's original host..
2013-09-10 15:06:53,114 ERROR [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Terminating HAWork[28-HA-246-Running-Scheduled]
java.lang.NullPointerException
         at
com.cloud.storage.VolumeManagerImpl.canVmRestartOnAnotherServer(VolumeManagerImpl.java:2641)
         at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:516)
         at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)


"