You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Koushik Das (JIRA)" <ji...@apache.org> on 2014/06/12 14:29:02 UTC
[jira] [Commented] (CLOUDSTACK-6857) Losing the connection from
CloudStack Manager to the agent will force a shutdown when connection is
re-established
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029099#comment-14029099 ]
Koushik Das commented on CLOUDSTACK-6857:
-----------------------------------------
Can you share the full logs? Based on the log snippet none of the available investigators were able to determine if VM is alive. In such a case something called 'fencers' tries to fence off the VM. If fencers fail nothing is done to the VM. Full logs will help understand what all happened.
> Losing the connection from CloudStack Manager to the agent will force a shutdown when connection is re-established
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-6857
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6857
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the default.)
> Components: Management Server
> Affects Versions: 4.3.0
> Environment: Ubuntu 12.04
> Reporter: c-hemp
> Priority: Critical
>
> If a physical host is not pingable that host goes into alert mode. If the physical hosts is unreachable, the virtual router is either unreachable or unable to ping a virtual on the physical host, and the manager is unable to ping the virtual instance it assumes the virtual is down and puts it into a stop state.
> When the connection is restablished, it gets the state from the database, sees that it is now in a stopped state, and will then shutdown the instance.
> This behavior can cause major outages if there is any type of network loss once the connectivity comes back. This is especially critical when using CloudStack across multiple colos.
> The logs when it happens:
> 14-06-06 02:01:22,259 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) PingInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,259 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-1:ctx-be848615 work-1953) Not a System Vm, unable to determine state of VM[User|cephvmstage013] returning null
> 2014-06-06 02:01:22,259 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-1:ctx-be848615 work-1953) Testing if VM[User|cephvmstage013] is alive
> 2014-06-06 02:01:22,260 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-1:ctx-be848615 work-1953) Unable to find a management nic, cannot ping this system VM, unable to determine state of VM[User|cephvmstage013] returning null
> 2014-06-06 02:01:22,260 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) ManagementIPSysVMInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,263 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) KVMInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,263 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) HypervInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,419 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) KVMInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,419 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) HypervInvestigator found VM[User|cephvmstage013]to be alive? null
> 2014-06-06 02:01:22,584 WARN [c.c.v.VirtualMachineManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) Unable to actually stop VM[User|cephvmstage013] but continue with release because it's a force stop
> 2014-06-06 02:01:22,585 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) VM[User|cephvmstage013] is stopped on the host. Proceeding to release resource held.
> 2014-06-06 02:01:22,648 WARN [c.c.v.VirtualMachineManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) Unable to actually stop VM[User|cephvmstage013] but continue with release because it's a force stop
> 2014-06-06 02:01:22,650 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) VM[User|cephvmstage013] is stopped on the host. Proceeding to release resource held.
> 2014-06-06 02:01:22,704 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) Successfully released network resources for the vm VM[User|cephvmstage013]
> 2014-06-06 02:01:22,704 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-4:ctx-e8eea7fb work-1950) Successfully released storage resources for the vm VM[User|cephvmstage013]
> 2014-06-06 02:01:22,774 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) Successfully released network resources for the vm VM[User|cephvmstage013]
> 2014-06-06 02:01:22,774 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-1:ctx-be848615 work-1953) Successfully released storage resources for the vm VM[User|cephvmstage013]
> The behavior should change to be set into an alert state, then once connectivity is re-established, if the instance is up, update the manager with the running status
--
This message was sent by Atlassian JIRA
(v6.2#6252)