You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2020/01/22 19:19:06 UTC

[GitHub] [cloudstack] richardlawley opened a new issue #3829: Unable to schedule async job - dangling vm_work_job rows

richardlawley opened a new issue #3829: Unable to schedule async job - dangling vm_work_job rows
URL: https://github.com/apache/cloudstack/issues/3829
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   Management Server
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master branch.
   -->
   
   ~~~
   4.11.2, appears unfixed in master
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, advanced networking, etc.  N/A otherwise
   -->
   N/A
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   AsyncJobManager contains code to expunge async job records after a configurable time (24h default).  Some async jobs have corresponding vm_work_job entries, which are created using the generated key of async_job.  The expunge code only removes the async_job record, leaving dangling vm_work_job rows.
   
   Because the supported version of mysql recalculates autonumber sequences based on the last current row, this can cause async jobs to be created with IDs referencing existing rows in vm_work_job, which causes a primary key violation if this clashes, showing the following log message:
   
   ```
   Unable to schedule async job for command com.cloud.vm.VmWorkStart, unexpected exception. 
   javax.persistence.EntityExistsException: Entity already exists
   ```
   
   The set of circumstances to trigger this is a bit obscure, which is probably why this has been mentioned on the mailing list a few times but never with any fix or explanation.  I believe the only fix required is to ensure that when jobs are purged from async_jobs, they are also purged from vm_work_job.
   
   For anyone else encountering this bug (hopefully the error text will be enough!), the workaround is to remove all vm_work_job rows above the current max async_job, i.e. 
   
   ```
   DELETE FROM vm_work_job WHERE id > (SELECT MAX(id) FROM async_job)
   ```
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   * On a clean management server, run some VM jobs to ensure that vm_work_job rows are present.
   * Wait for `job.expire.minutes` so that mgmt server can clean out the async_jobs table
   * Observe that vm_work_job rows are still present
   * Restart mysql server (perhaps mgmt server too, unsure)
   * Try to start/stop VMs until the async_job id clashes with a vm_work_job row.
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   VM should start/stop correctly
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   Error Starting/Stopping VM, with log message as below:
   
   ```
   2020-01-22 09:40:28,492 WARN  [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-13:ctx-e8081448 job-4920 ctx-ad7b892d) (logid:08136108) Unable to schedule async job for command com.cloud.vm.VmWorkStop, unexpected exception.
   javax.persistence.EntityExistsException: Entity already exists:
           at com.cloud.utils.db.GenericDaoBase.persist(GenericDaoBase.java:1434)
           at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.doInTransaction(AsyncJobManagerImpl.java:235)
           at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.doInTransaction(AsyncJobManagerImpl.java:231)
           at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
           at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
           at com.cloud.utils.db.Transaction.execute(Transaction.java:47)
           at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl.submitAsyncJob(AsyncJobManagerImpl.java:231)
           at com.cloud.vm.VirtualMachineManagerImpl.stopVmThroughJobQueue(VirtualMachineManagerImpl.java:4505)
           at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1600)
           at com.cloud.vm.VirtualMachineManagerImpl.destroy(VirtualMachineManagerImpl.java:1882)
           at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.destroyVirtualMachine(VMEntityManagerImpl.java:271)
           at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.destroy(VirtualMachineEntityImpl.java:233)
           at com.cloud.vm.UserVmManagerImpl.destroyVm(UserVmManagerImpl.java:4534)
           at com.cloud.vm.UserVmManagerImpl.destroyVm(UserVmManagerImpl.java:2758)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
           at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
           at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
           at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
           at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
           at com.sun.proxy.$Proxy164.destroyVm(Unknown Source)
           at org.apache.cloudstack.api.command.admin.vm.DestroyVMCmdByAdmin.execute(DestroyVMCmdByAdmin.java:46)
           at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
           at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
           at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:581)
           at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
           at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
           at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
           at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
           at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
           at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:529)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '4921' for key 'PRIMARY'
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
           at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           ... 59 more
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [cloudstack] rhtyd closed issue #3829: Unable to schedule async job - dangling vm_work_job rows

Posted by GitBox <gi...@apache.org>.
rhtyd closed issue #3829: Unable to schedule async job - dangling vm_work_job rows
URL: https://github.com/apache/cloudstack/issues/3829
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services