You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Koushik Das (JIRA)" <ji...@apache.org> on 2013/07/04 01:57:19 UTC

[jira] [Resolved] (CLOUDSTACK-3137) systemvms HA fails on latest master (4591f94)

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koushik Das resolved CLOUDSTACK-3137.
-------------------------------------

       Resolution: Cannot Reproduce
    Fix Version/s: 4.2.0

Resolving it as not able to repro. Reopen if you see that the system vms are not at all coming up.
                
> systemvms HA fails on latest master (4591f94)
> ---------------------------------------------
>
>                 Key: CLOUDSTACK-3137
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3137
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>            Reporter: Prasanna Santhanam
>            Assignee: Koushik Das
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: logs.tar.bz2
>
>
> SystemVMs fail to start during HA on latest master (4591f94a0bb69875dca769c9738998d3b5b96106)
> It looks like the HA of system VMs is failing. On our automated test environment we reboot the management server before the systemVMs spin up to ensure that the global settings are affected and the configuration for the secstorage.internal.sites is rightly set in the secondary storage VM. The management server upon reboot discovers that SSVM was put to perform HA in the op_ha_work table and starts to re-spin the SSVM. At this time the CPVM is also booting up. No further activity is observed after two HA attempts.
> SystemVM start before Management server start: 
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,738 DEBUG [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating entries for VM: VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,771 DEBUG [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating nics for VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,774 DEBUG [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm VM[SecondaryStorageVm|s-1-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-null-null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,843 DEBUG [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm VM[SecondaryStorageVm|s-1-VM] in network Ntwk[202|Control|3] with requested profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,870 DEBUG [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm VM[SecondaryStorageVm|s-1-VM] in network Ntwk[201|Management|2] with requested profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,916 DEBUG [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm VM[SecondaryStorageVm|s-1-VM] in network Ntwk[203|Storage|4] with requested profile null
> ...
> ...
> Jun 24 22:30:52 cloudstack-centos63 local0: 2013-06-25 05:30:52,552 DEBUG [agent.transport.Request] (consoleproxy-1:null) Seq 3-2067660811: Sending  { Cmd , MgmtId: 200973787296321, via: 3, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/1/","origUrl":"http://nfs/templates/acton/acton-systemvm-02062012.vhd.bz2","uuid":"1","id":1,"format":"VHD","accountId":1,"checksum":"f613f38c96bf039f2e5cbf92fa8ad4f8","hvm":false,"displayText":"SystemVM Template (XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://nfs.fmt.vmops.com:/expo...
> Jun 24 22:31:01 cloudstack-centos63 local0: 2013-06-25 05:31:01,824 INFO  [context.support.XmlWebApplicationContext] (Thread-26:null) Closing Root WebApplicationContext: startup date [Mon Jun 24 23:25:50 UTC 2013]; root of context hierarchy
> ^^^^^^^^^^ Management Server Restarts ^^^^^^^^^^^^^^
> After restart SSVM scheduled for HA
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,394 INFO  [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,455 INFO  [cloud.vm.VirtualMachineManagerImpl] (Timer-1:null) Handling unfinished work item: ItWork[1805a89b-d03a-4f1f-b717-ac9f5da6edea-Starting-2-Prepare]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,504 INFO  [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  VM[ConsoleProxy|v-2-VM]
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,540 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:null) Starting work
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,547 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:null) Starting work
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,815 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Processing HAWork[1-HA-1-Starting-Investigating]
> ...
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,289 DEBUG [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-2) Unable to reach the agent for VM[ConsoleProxy|v-2-VM]: Resource [Host:3] is unreachable: Host 3: Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,290 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) SimpleInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,293 DEBUG [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-2:work-1) Unable to reach the agent for VM[SecondaryStorageVm|s-1-VM]: Resource [Host:2] is unreachable: Host 2: Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,298 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) SimpleInvestigator found VM[SecondaryStorageVm|s-1-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,336 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) XenServerInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-2) Not a User Vm, unable to determine state of VM[ConsoleProxy|v-2-VM] returning null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) null found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-2) Testing if VM[ConsoleProxy|v-2-VM] is alive
> ....
> Rescheduled HA
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,623 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) Rescheduling HAWork[2-HA-2-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 2013
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,626 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Rescheduling HAWork[1-HA-1-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 2013
> ...
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,586 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Processing HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,592 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) HA on VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) VM VM[SecondaryStorageVm|s-1-VM] has been changed.  Current State = Starting Previous State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Completed HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,611 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Processing HAWork[2-HA-2-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) HA on VM[ConsoleProxy|v-2-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) VM VM[ConsoleProxy|v-2-VM] has been changed.  Current State = Starting Previous State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Completed HAWork[2-HA-2-Starting-Investigating]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira