You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Ove Ewerlid (JIRA)" <ji...@apache.org> on 2013/05/20 07:17:15 UTC
[jira] [Created] (CLOUDSTACK-2568) ACS41 regression in storage subsystem (seen with local storage and 2 or more hosts)

Ove Ewerlid created CLOUDSTACK-2568:
---------------------------------------

             Summary: ACS41 regression in storage subsystem (seen with local storage and 2 or more hosts)
                 Key: CLOUDSTACK-2568
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2568
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Storage Controller
    Affects Versions: 4.1.0
         Environment: RHES64 as in OEL64. Install from RPM built from latest GIT on OEL64.
2 or more KVM hypervisors with local storage in one cluster that has one primary NFS storage pool.
            Reporter: Ove Ewerlid
            Priority: Blocker
             Fix For: 4.1.0


ACS402 works with no issues when tested in exactly the setup where ACS41 fails.
Identical configuration (the same setup program is used for testing both versions).

In ACS410 startVM fails if and only if the advanceStart: log line picks a poolID that is not valid.
E.g., the poolID reported in this logline appears random across a large number of tests.
If a poolID that can not be reached by the host selected for deployment, the startVM fails.

This is blocking upgrade from 4.0 to 4.1 since there is no  reliable way to start VMs that have been deployed. If a deployed VM fails to start, giving the startVM command multiple times, will eventually make the VM start.
The more hosts there are, the less likely it is a startVM will succeed. It is less likely that the poolID is correct.

The below log portion conveys how the VM has a "correct" Deployment Destination reported and the advanceStart reports a poolID that is different and since the selected hypervisor can not reach the poolID the startVM fails.

The bug never triggers if there is only one KVM with local storage since the poolId can not be wrong, there is just one (and the NFS pool is always valid).

-------------------
2013-05-20 06:49:29,477 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Found a potential host id: 1 name: vm3-net0-s0-14.test.devops and associated storage pools for this VM
2013-05-20 06:49:29,478 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Returning Deployment Destination: Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(1)-Host\
(1)-Storage(Volume(10|ROOT-->Pool(200), Volume(11|DATADISK-->Pool(200))]
2013-05-20 06:49:29,495 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-34:job-34) VM state transitted from :Stopped to Starting with event: StartRequestedvm's original host id: null new host id: null host id before state trans\
ition: null
2013-05-20 06:49:29,495 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Successfully transitioned to start state for VM[User|testvm-a] reservation id = e644d55e-3627-4395-9f89-639e6fc2f261
2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Trying to deploy VM, vm has dcId: 1 and podId: null
2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) advanceStart: DeploymentPlan is provided, using dcId:1, podId: 1, clusterId: 1, hostId: 1, poolId: 201
2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Deploy avoids pods: null, clusters: null, hosts: null
2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) DeploymentPlanner allocation algorithm: random
2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 4000, requested ram: 2147483648
2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Is ROOT volume READY (pool already allocated)?: Yes
2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) DeploymentPlan has host_id specified, making no checks on this host, looks like admin test: 1
2013-05-20 06:49:29,505 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Looking for suitable pools for this host under zone: 1, pod: 1, cluster: 1
2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): (10,ROOT)
2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if pool can be reused, poolId: null
2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) finding pool by id '201'
2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Planner need not allocate a pool for this volume since its READY
2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): (11,DATADISK)
2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if pool can be reused, poolId: null
2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) finding pool by id '201'
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Planner need not allocate a pool for this volume since its READY
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Trying to find a potenial host and associated storage pools from the suitable host/pool lists for this VM
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking if host: 1 can access any suitable storage pool for volume: DATADISK
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Host: 1 cannot access pool: 201
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Could not find a potential host that has associated storage pools from the suitable host/pool lists for this VM
2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Cannnot deploy to specified host, returning.
2013-05-20 06:49:29,524 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-34:job-34) VM state transitted from :Starting to Stopped with event: OperationFailedvm's original host id: null new host id: null host id before state tran\
sition: null
2013-05-20 06:49:29,533 ERROR [cloud.async.AsyncJobManagerImpl] (Job-Executor-34:job-34) Unexpected exception while executing org.apache.cloudstack.api.command.user.vm.StartVMCmd
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|testvm-a]Scope=interface com.cloud.dc.DataCenter; id=1
        at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:728)
        at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:471)
        at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:212)
        at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
        at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3865)
        at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2573)
        at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
        at org.apache.cloudstack.api.command.user.vm.StartVMCmd.execute(StartVMCmd.java:120)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:162)
        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:437)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira