You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Koushik Das (JIRA)" <ji...@apache.org> on 2016/12/09 07:18:58 UTC
[jira] [Commented] (CLOUDSTACK-9660) NPE while destroying volumes during 1000 VMs deploy and destroy tests

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734567#comment-15734567 ] 

Koushik Das commented on CLOUDSTACK-9660:
-----------------------------------------

NPE is seen as the storage cleanup interval is set to a low value. For typical deployments the value is usually set to 1 day. The chances of hitting this issue is less at higher values but can happen occasionally.

Two threads try to destroy the same volume simultaneously. In this case NPE is seen in the storage cleanup thread.
- Destroy VM thread deleting the ROOT volume
- Storage cleanup thread also deleting the same ROOT volume

Lifecycle of ROOT volume is tied to VM, so it makes sense to remove it as part of VM destroy. Currently the storage cleanup thread also destroys all types of volumes (including ROOT). So as can be seen there is an overlap between the two and that leads to the NPE described. Fix would be to exclude ROOT volumes in storage cleanup thread as these are destroyed as part of VM destroy/expunge.



> NPE while destroying volumes during 1000 VMs deploy and destroy tests
> ---------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9660
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9660
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.10.0.0
>            Reporter: Koushik Das
>            Assignee: Koushik Das
>             Fix For: 4.10.0.0
>
>
> Steps:
> 1. Install and configure a zone (advanced or basic).
> 2. Set config storage.cleanup.enabled = true and storage.cleanup.interval = 10 seconds
> 3. Deploy 1000 VMs and then destroy over multiple iterations.
> NPE seen in MS logs while deleting volume:
> 2015-06-18 16:27:47,797 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Cleaning up hypervisor data structures (ex. SRs in XenServer) for managed storage
> 2015-06-18 16:27:47,799 DEBUG [o.a.c.e.o.VolumeOrchestrator] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Cleaning storage for vm: 2894
> 2015-06-18 16:27:47,823 INFO [o.a.c.s.v.VolumeServiceImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Expunge volume with no data store specified
> 2015-06-18 16:27:47,828 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Storage pool garbage collector found 0 templates to clean up in storage pool: XenRT-Zone-0-Pod-0-Cluster-0-Primary-Store-0
> 2015-06-18 16:27:47,828 INFO [o.a.c.s.v.VolumeServiceImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Volume 2894 is not referred anywhere, remove it from volumes table
> 2015-06-18 16:27:47,829 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Storage pool garbage collector found 0 templates to clean up in storage pool: XenRT-Zone-0-Pod-0-Cluster-1-Primary-Store-0
> 2015-06-18 16:27:47,832 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Secondary storage garbage collector found 0 templates to cleanup on template_store_ref for store: nfs://10.81.56.7/xenrtnfs/1092931-dycPsK
> 2015-06-18 16:27:47,833 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Secondary storage garbage collector found 0 snapshots to cleanup on snapshot_store_ref for store: nfs://10.81.56.7/xenrtnfs/1092931-dycPsK
> 2015-06-18 16:27:47,834 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Secondary storage garbage collector found 0 volumes to cleanup on volume_store_ref for store: nfs://10.81.56.7/xenrtnfs/1092931-dycPsK
> 2015-06-18 16:27:47,842 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Expunged VM[User|i-10-2894-VM]
> 2015-06-18 16:27:47,844 WARN [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-5e7b4eda) (logid:bb642325) Unable to destroy volume 0b22f54b-3242-49ef-b16d-1c7801d5c2bd
> java.lang.NullPointerException
> at org.apache.cloudstack.storage.volume.VolumeServiceImpl.expungeVolumeAsync(VolumeServiceImpl.java:276)
> at com.cloud.storage.StorageManagerImpl.cleanupStorage(StorageManagerImpl.java:1121)
> at com.cloud.storage.StorageManagerImpl$StorageGarbageCollector.runInContext(StorageManagerImpl.java:1481)
> at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> 2015-06-18 16:27:47,850 DEBUG [c.c.u.AccountManagerImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Access granted to Acct[ed48b7f2-15a0-11e5-96dd-d275a7df156a-system] to Domain:1/ by AffinityGroupAccessChecker
> 2015-06-18 16:27:47,871 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-1:ctx-5ecc886e) (logid:132e3ff8) Starting cleaning up vm VM[User|i-10-2894-VM] resources...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)