You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Ryan Farrington (JIRA)" <ji...@apache.org> on 2015/10/15 22:40:05 UTC

[jira] [Updated] (CLOUDSTACK-8949) Job timeout caused by improper/inconsistent application of wait parameters

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Farrington updated CLOUDSTACK-8949:
----------------------------------------
    Description: 
when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would not use the default value of parameter "wait".  That timeout value should be the same as the one used on CitrixResourceBase and its children to control the migration of volumes.

Environment:    XenServer 6.2, multiple pre-setup primary storage locations that are iscsi SRs in XenServer.

Summary of actions:      Attempt to move a volume that is attached to a running instance to another primary storage location.


What *I* assume should happen:    Cloudstack communicates with the host of the VM in question and issues a xe vdi-pool-migrate.   The command is synchronous and doesn't return control back until it has performed the action.   Cloudstack should know this is a synchronous call and wait the configured amount of time for the migration to be performed.


What *I* think is happening(with some education from Rafael Weingärtner):    There are multiple timers being set in the process (please reference  




{noformat}
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Sending  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Executing:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more time because this is the current command
2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more time because this is the current command
2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more commands found
2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
        at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
        at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
        at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
        at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
        at com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
        at com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
        at com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
        at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
        at org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
        at com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
        at com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:701)
2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no need to delete from object in store ref table
2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed to migrate volume"}
2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Done executing org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
{noformat}


  was:
when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would not use the default value of parameter "wait".  That timeout value should be the same as the one used on CitrixResourceBase and its children to control the migration of volumes.



> Job timeout caused by improper/inconsistent application of wait parameters
> --------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8949
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller
>    Affects Versions: 4.3.0
>         Environment: XenServer 6.2
>            Reporter: Ryan Farrington
>            Priority: Minor
>
> when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would not use the default value of parameter "wait".  That timeout value should be the same as the one used on CitrixResourceBase and its children to control the migration of volumes.
> Environment:    XenServer 6.2, multiple pre-setup primary storage locations that are iscsi SRs in XenServer.
> Summary of actions:      Attempt to move a volume that is attached to a running instance to another primary storage location.
> What *I* assume should happen:    Cloudstack communicates with the host of the VM in question and issues a xe vdi-pool-migrate.   The command is synchronous and doesn't return control back until it has performed the action.   Cloudstack should know this is a synchronous call and wait the configured amount of time for the migration to be performed.
> What *I* think is happening(with some education from Rafael Weingärtner):    There are multiple timers being set in the process (please reference  
> {noformat}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Sending  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Executing:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
> 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
> 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more time because this is the current command
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more time because this is the current command
> 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
> 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] }
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
>         at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
>         at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
>         at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
>         at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
>         at org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
>         at com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
>         at com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
>         at com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
>         at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
>         at org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
>         at com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Done executing org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)