You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sangeetha Hariharan (JIRA)" <ji...@apache.org> on 2014/01/17 20:40:20 UTC

[jira] [Reopened] (CLOUDSTACK-5504) Vmware-Primary store unavailable for 10 mts - All snapshot tasks reported failure because of timing out after 20 minutes.But the snapshot process continues to succeed in Vmcenter after NFS was brought up.

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangeetha Hariharan reopened CLOUDSTACK-5504:
---------------------------------------------


Hi Likitha,

We have a global parameter - backup.snapshot.wait (set to 6 hours). In case of Vmware , this global parameter will not be  acknowledged ? We do expect the backup snapshot to take long , in general. But this seems to be restricted by vmware.vcenter.session.timeout parameter which is set to only 20 mts by default.  Should this default value be adjusted to be more realistic Or should this  be set dynamically per request basis?

Thanks
Sangeetha

> Vmware-Primary store unavailable for 10 mts - All snapshot tasks reported failure because of timing out after 20 minutes.But the snapshot process continues to succeed in Vmcenter after NFS was brought up.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5504
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5504
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.3.0
>         Environment: Build from 4.3
>            Reporter: Sangeetha Hariharan
>            Assignee: Likitha Shetty
>            Priority: Critical
>             Fix For: 4.3.0
>
>         Attachments: primarydown.rar
>
>
> Setup:
> Advanced zone set up with 2 5.1 ESXI host.
> 1. Deploy few Vms in each of the hosts  , so we start with 11 Vms.
> 2. Create snapshot for ROOT volumes.
> 3. When snapshot is still in progress , Make the primary storage unavailable for 10 mts.
> 4. Bring up the primary store after more than 10 mts.
> When the primary store was brought up , I see the snapshots that were in progress actually continue to download to secondary and succeed . 
> one of the snapshots that succeeded and fully available in secondary store:
> root@Rack3Host8 1c545037-1d1c-4927-918a-2f3975e1076b]# ls -ltr
> total 446808
> -rw-r--r--. 1 root root      6454 Dec 13 21:09 1c545037-1d1c-4927-918a-2f3975e1076b.ovf
> -rw-r--r--. 1 root root 457069056 Dec 13 21:09 1c545037-1d1c-4927-918a-2f3975e1076b-disk0.vmdk
> [root@Rack3Host8 1c545037-1d1c-4927-918a-2f3975e1076b]#
> But all the 11 snapshot tasks from Cloud Stack side report failure after about 20 minutes and then snapshots are put in "CreatedOnPrimary" state.
> Next scheduled hourly snapshot is attempted and succeeds.
> |        22 | CreatedOnPrimary | 2013-12-13 21:52:15 | NULL                |
> |        21 | CreatedOnPrimary | 2013-12-13 21:52:15 | NULL                |
> |        20 | CreatedOnPrimary | 2013-12-13 21:52:15 | NULL                |
> |        19 | CreatedOnPrimary | 2013-12-13 21:52:15 | NULL                |
> |        18 | CreatedOnPrimary | 2013-12-13 21:52:16 | NULL                |
> |        17 | CreatedOnPrimary | 2013-12-13 21:52:16 | NULL                |
> |        16 | CreatedOnPrimary | 2013-12-13 21:52:16 | NULL                |
> |        14 | CreatedOnPrimary | 2013-12-13 21:52:17 | NULL                |
> |        25 | CreatedOnPrimary | 2013-12-13 21:52:17 | NULL                |
> |        24 | CreatedOnPrimary | 2013-12-13 21:52:17 | NULL                |
> |        23 | CreatedOnPrimary | 2013-12-13 21:52:18 | NULL                |
> |        22 | BackedUp         | 2013-12-13 22:42:15 | NULL                |
> |        21 | BackedUp         | 2013-12-13 22:42:15 | NULL                |
> |        20 | BackedUp         | 2013-12-13 22:42:15 | NULL                |
> |        19 | BackedUp         | 2013-12-13 22:42:15 | NULL                |
> |        18 | BackedUp         | 2013-12-13 22:42:15 | NULL                |
> |        17 | BackedUp         | 2013-12-13 22:42:16 | NULL                |
> |        16 | BackedUp         | 2013-12-13 22:42:16 | NULL                |
> |        14 | BackedUp         | 2013-12-13 22:42:16 | NULL                |
> |        25 | BackedUp         | 2013-12-13 22:42:17 | NULL                |
> |        24 | BackedUp         | 2013-12-13 22:42:17 | NULL                |
> |        23 | BackedUp         | 2013-12-13 22:42:17 | NULL                |
> +-----------+------------------+---------------------+---------------------+
> 86 rows in set (0.00 sec)
> 2013-12-13 16:52:17,720 DEBUG [c.c.a.t.Request] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Seq 5-422576170: Sending  { Cmd , MgmtId: 95307354844397, via: 5(s-10-VM), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"ffa0b125-d1d9-4524-bd9e-03178914845b","volume":{"uuid":"15189035-3592-41ac-b2bc-a39d247e7d2f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"name":"ROOT-18","size":2147483648,"path":"ROOT-18","volumeId":18,"vmName":"i-4-18-VM","accountId":4,"chainInfo":"{\"diskDeviceBusName\":\"ide0:1\",\"diskChain\":[\"[a0c555cc695c3343bfa03413a91dbfed] i-4-18-VM/ROOT-18.vmdk\"]}","format":"OVA","id":18,"deviceId":0,"hypervisorType":"VMware"},"parentSnapshotPath":"0476f8f4-61b4-40d2-8e04-001a119fc294","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"vmName":"i-4-18-VM","name":"TestVM-tiny-host-0ps-0-2_ROOT-18_20131213215216","hypervisorType":"VMware","id":69,"quiescevm":false}},"destTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/4/18","volume":{"uuid":"15189035-3592-41ac-b2bc-a39d247e7d2f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"name":"ROOT-18","size":2147483648,"path":"ROOT-18","volumeId":18,"vmName":"i-4-18-VM","accountId":4,"chainInfo":"{\"diskDeviceBusName\":\"ide0:1\",\"diskChain\":[\"[a0c555cc695c3343bfa03413a91dbfed] i-4-18-VM/ROOT-18.vmdk\"]}","format":"OVA","id":18,"deviceId":0,"hypervisorType":"VMware"},"parentSnapshotPath":"snapshots/4/18/730b8477-d29f-45e8-b2a2-fb26e2ac220b/730b8477-d29f-45e8-b2a2-fb26e2ac220b","dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://10.223.57.194/export/home/vmware/secondary","_role":"Image"}},"vmName":"i-4-18-VM","name":"TestVM-tiny-host-0ps-0-2_ROOT-18_20131213215216","hypervisorType":"VMware","id":69,"quiescevm":false}},"executeInSequence":false,"wait":21600}}] }
> 2013-12-13 17:13:44,250 DEBUG [c.c.a.m.AgentAttache] (AgentConnectTaskPool-3:ctx-1a3a5155) Seq 5-422576
> 170: Sending disconnect to class com.cloud.agent.manager.SynchronousListener
> 2013-12-13 17:13:44,252 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Seq 5-422576170: Waiting some more time because this is the current command
> 2013-12-13 17:13:44,252 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Seq 5-422576170: Waiting some more time because this is the current command
> 2013-12-13 17:13:44,252 INFO  [c.c.u.e.CSExceptionErrorCode] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
> 2013-12-13 17:13:44,256 WARN  [c.c.a.m.AgentAttache] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Seq 5-422576170: Timed out on Seq 5-422576170:  { Cmd , MgmtId: 95307354844397, via: 5(s-10-VM), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"ffa0b125-d1d9-4524-bd9e-03178914845b","volume":{"uuid":"15189035-3592-41ac-b2bc-a39d247e7d2f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"name":"ROOT-18","size":2147483648,"path":"ROOT-18","volumeId":18,"vmName":"i-4-18-VM","accountId":4,"chainInfo":"{\"diskDeviceBusName\":\"ide0:1\",\"diskChain\":[\"[a0c555cc695c3343bfa03413a91dbfed] i-4-18-VM/ROOT-18.vmdk\"]}","format":"OVA","id":18,"deviceId":0,"hypervisorType":"VMware"},"parentSnapshotPath":"0476f8f4-61b4-40d2-8e04-001a119fc294","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"vmName":"i-4-18-VM","name":"TestVM-tiny-host-0ps-0-2_ROOT-18_20131213215216","hypervisorType":"VMware","id":69,"quiescevm":false}},"destTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/4/18","volume":{"uuid":"15189035-3592-41ac-b2bc-a39d247e7d2f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a0c555cc-695c-3343-bfa0-3413a91dbfed","id":1,"poolType":"NetworkFilesystem","host":"10.223.57.195","path":"/export/home/vmware/primary","port":2049,"url":"NetworkFilesystem://10.223.57.195//export/home/vmware/primary/?ROLE=Primary&STOREUUID=a0c555cc-695c-3343-bfa0-3413a91dbfed"}},"name":"ROOT-18","size":2147483648,"path":"ROOT-18","volumeId":18,"vmName":"i-4-18-VM","accountId":4,"chainInfo":"{\"diskDeviceBusName\":\"ide0:1\",\"diskChain\":[\"[a0c555cc695c3343bfa03413a91dbfed] i-4-18-VM/ROOT-18.vmdk\"]}","format":"OVA","id":18,"deviceId":0,"hypervisorType":"VMware"},"parentSnapshotPath":"snapshots/4/18/730b8477-d29f-45e8-b2a2-fb26e2ac220b/730b8477-d29f-45e8-b2a2-fb26e2ac220b","dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://10.223.57.194/export/home/vmware/secondary","_role":"Image"}},"vmName":"i-4-18-VM","name":"TestVM-tiny-host-0ps-0-2_ROOT-18_20131213215216","hypervisorType":"VMware","id":69,"quiescevm":false}},"executeInSequence":false,"wait":21600}}] }
> 2013-12-13 17:13:44,294 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Seq 5-422576170: Cancelling.
> 2013-12-13 17:13:44,305 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Failed to send command, due to Agent:2, com.cloud.exception.OperationTimedoutException: Commands 422576170 to Host 5 timed out after 43200
> 2013-12-13 17:13:44,305 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) copy snasphot failed: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:2, com.cloud.exception.OperationTimedoutException: Commands 422576170 to Host 5 timed out after 43200
> 2013-12-13 17:13:44,305 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) copy failed
> 2013-12-13 17:13:44,440 DEBUG [c.c.s.s.SnapshotManagerImpl] (Job-Executor-5:ctx-1d3bd6cc ctx-837a59a5) Failed to create snapshot
> com.cloud.utils.exception.CloudRuntimeException: com.cloud.utils.exception.CloudRuntimeException: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:2, com.cloud.exception.OperationTimedoutException: Commands 422576170 to Host 5 timed out after 43200
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:275)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:135)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:294)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:951)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at $Proxy161.takeSnapshot(Unknown Source)
>         at org.apache.cloudstack.storage.volume.VolumeServiceImpl.takeSnapshot(VolumeServiceImpl.java:1341)
>         at com.cloud.storage.VolumeApiServiceImpl.takeSnapshot(VolumeApiServiceImpl.java:1486)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)