You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sangeetha Hariharan (JIRA)" <ji...@apache.org> on 2013/07/10 18:31:53 UTC
[jira] [Created] (CLOUDSTACK-3451) Parallel deployment - Xenserver - When deploying 30 Vms in parallel, some of the Vm deployment fails when “applying dhcp entry/applying userdata and password entry on router” and retry eventually happens when they succeed.

Sangeetha Hariharan created CLOUDSTACK-3451:
-----------------------------------------------

             Summary: Parallel deployment - Xenserver - When deploying 30 Vms in parallel, some of the Vm deployment fails when “applying dhcp entry/applying userdata and password entry on router”  and retry eventually happens when they succeed.
                 Key: CLOUDSTACK-3451
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3451
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.2.0
         Environment: Build from 4.2
            Reporter: Sangeetha Hariharan
            Priority: Critical
             Fix For: 4.2.0


Steps to reproduce the problem:

Advanced zone set up with Xenserver host.

Deploy 30 Vms in parallel.


9 of the Vm deployments actually had a failure when trying to “apply userdata and password entry on router” / “apply dhcp entry”. But in all these cases I see that we stop the Vm that is in “Starting” state and immediately attempt to start the Vm which succeeds this time.

Issues:

1. We should not be seeing any failures during the jobs for "applying userdata and password entry on router” / “applying dhcp entry”.

2. Why is there a logic to retry the Vm deployment again ? In my case I have only 1 host in the setup which seems to be put in avoid state as part of the initial failure. But again the Vm gets successfully deployed in this host.

Jobs that went thru this scenario are:

[root@asfmgmt management]# grep -i "stopcommand" management-server.log  | grep Executing
2013-07-09 16:30:05,707 DEBUG [agent.transport.Request] (Job-Executor-18:job-18) Seq 1-729350370: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-9-VM","wait":0}}] }
2013-07-09 16:30:12,547 DEBUG [agent.transport.Request] (Job-Executor-15:job-15) Seq 1-729350377: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-7-VM","wait":0}}] }
2013-07-09 16:30:20,990 DEBUG [agent.transport.Request] (Job-Executor-38:job-38) Seq 1-729350381: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-26-VM","wait":0}}] }
2013-07-09 16:30:23,529 DEBUG [agent.transport.Request] (Job-Executor-25:job-25) Seq 1-729350383: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-18-VM","wait":0}}] }
2013-07-09 16:30:32,231 DEBUG [agent.transport.Request] (Job-Executor-35:job-35) Seq 1-729350390: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-27-VM","wait":0}}] }
2013-07-09 16:30:45,744 DEBUG [agent.transport.Request] (Job-Executor-17:job-17) Seq 1-729350407: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-8-VM","wait":0}}] }
2013-07-09 16:30:46,511 DEBUG [agent.transport.Request] (Job-Executor-36:job-36) Seq 1-729350408: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-25-VM","wait":0}}] }
2013-07-09 16:31:04,826 DEBUG [agent.transport.Request] (Job-Executor-35:job-35) Seq 1-729350422: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-27-VM","wait":0}}] }
2013-07-09 16:31:17,707 DEBUG [agent.transport.Request] (Job-Executor-26:job-26) Seq 1-729350432: Executing:  { Cmd , MgmtId: 7200344900649, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"vmName":"i-3-15-VM","wait":0}}] }


Management server log snippet:

2013-07-09 16:30:22,214 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-38:job-38) Successfully cleanued up resources for the vm VM[U
ser|hello-16] in Starting state
2013-07-09 16:30:22,220 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) Deploy avoids pods: null, clusters: null,
hosts: [1]
2013-07-09 16:30:22,220 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) DataCenter id = '1' provided is in avoid s
et, DeploymentPlanner cannot allocate the VM, returning.
2013-07-09 16:30:22,236 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) VM state transitted from :Starting to Stopped with
event: OperationFailedvm's original host id: null new host id: null host id before state transition: 1
2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) Hosts's actual total CPU: 9044 and CPU after apply
ing overprovisioning: 9044
2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) Hosts's actual total RAM: 16190149248 and RAM afte
r applying overprovisioning: 16190149632
2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) release cpu from host: 1, old used: 4600,reserved:
0, actual total: 9044, total with overprovisioning: 9044; new used: 4500,reserved:0; movedfromreserved: false,moveToReserveredfalse
2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) release mem from host: 1, old used: 9602859008,res
erved: 0, total: 16190149632; new used: 9340715008,reserved:0; movedfromreserved: false,moveToReserveredfalse
2013-07-09 16:30:22,256 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-38:job-38) VM state transitted from :Stopped to Starting with
event: StartRequestedvm's original host id: null new host id: null host id before state transition: null
2013-07-09 16:30:22,256 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-38:job-38) Successfully transitioned to start state for VM[Us
er|hello-16] reservation id = 8b8d8303-0f78-4f03-9ee3-4cc3e129c746
2013-07-09 16:30:22,262 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-38:job-38) Trying to deploy VM, vm has dcId: 1 and podId: 1
2013-07-09 16:30:22,262 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-38:job-38) Deploy avoids pods: null, clusters: null, hosts: n
ull
2013-07-09 16:30:22,269 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) Deploy avoids pods: null, clusters: null,
hosts: null
2013-07-09 16:30:22,270 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) DeploymentPlanner allocation algorithm: co
m.cloud.deploy.FirstFitPlanner_EnhancerByCloudStack_b2132c10@13aee390
2013-07-09 16:30:22,270 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) Trying to allocate a host and storage pool
s from dc:1, pod:1,cluster:null, requested cpu: 100, requested ram: 262144000
2013-07-09 16:30:22,273 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) Is ROOT volume READY (pool already allocat
ed)?: No
2013-07-09 16:30:22,273 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-38:job-38) Searching resources only under specified Pod: 1
2013-07-09 16:30:22,273 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-38:job-38) Listing clusters in order of aggregate capacity, that ha
ve (atleast one host with) enough CPU and RAM capacity under this Pod: 1
2013-07-09 16:30:22,280 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Job-Executor-38:job-38) Checking resources in Cluster: 1 under Pod
: 1
2013-07-09 16:30:22,280 DEBUG [allocator.impl.FirstFitAllocator] (Job-Executor-38:job-38 FirstFitRoutingAllocator) Looking for hosts in dc: 1
pod:1  cluster:1
2013-07-09 16:30:22,282 DEBUG [allocator.impl.FirstFitAllocator] (Job-Executor-38:job-38 FirstFitRoutingAllocator) FirstFitAllocator has 1 hos
ts to check for allocation: [Host[-1-Routing]]


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira