You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Rayees Namathponnan (JIRA)" <ji...@apache.org> on 2014/09/21 21:54:34 UTC

[jira] [Closed] (CLOUDSTACK-6674) [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rayees Namathponnan closed CLOUDSTACK-6674.
-------------------------------------------

> [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back 
> -----------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-6674
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6674
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM
>    Affects Versions: 4.4.0
>         Environment: KVM (RHEL 6.3)
> 4.4-forward
>            Reporter: Rayees Namathponnan
>            Assignee: edison su
>            Priority: Critical
>             Fix For: 4.4.0
>
>
> This issue observed in automation run,  when KVM agent is alert state, it never tries to connect back,  also observed SQL DB lock in MS log for AgentConnectTaskPool
> You need to restart agent manually to connect it back 
> Here is agent log 
> 2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:53,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:53,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progress.
> ~
> MS log, please see the attached log for more info
> 2014-05-13 20:07:50,170 DEBUG [c.c.v.VirtualMachineManagerImpl] (API-Job-Executor-93:job-2240 ctx-f6efb16a) Destroying vm VM[User|i-137-304-QA]
> 2014-05-13 20:07:50,171 DEBUG [c.c.u.d.T.Transaction] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Rolling back the transaction: Time = 291 Name =  API-Job-Ex
> ecutor-65; called by -TransactionLegacy.rollback:903-TransactionLegacy.removeUpTo:846-TransactionLegacy.close:670-TransactionContextInterceptor.invoke:36-Ref
> lectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocation.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy97.markA
> sUnavailable:-1-IpAddressManagerImpl$10.doInTransaction:1639-IpAddressManagerImpl$10.doInTransaction:1620-Transaction$2.doInTransaction:49
> 2014-05-13 20:07:50,173 WARN  [c.c.u.AccountManagerImpl] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Failed to cleanup account Acct[20c063c7-9913-496d-983f-e
> fe18a92a9dc-test-account-TestScaleVmDynamicServiceOffering-test_change_so_running_vm_dynamic_to_dynamic_1_ADMIN] due to
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: com.mysql.jdbc.PreparedStatement@76384607: UPDATE user_ip_address SET user_ip_address.state
> ='Releasing' WHERE user_ip_address.id = 15
>         at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:851)
>         at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:1327)
>         at com.cloud.network.dao.IPAddressDaoImpl.markAsUnavailable(IPAddressDaoImpl.java:323)
>         at sun.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy97.markAsUnavailable(Unknown Source)
>         at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1639)
>         at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1620)
>         at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49)
>         at com.cloud.utils.db.Transaction.execute(Transaction.java:37)
>         at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
>         at com.cloud.network.IpAddressManagerImpl.markIpAsUnavailable(IpAddressManagerImpl.java:1620)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.cleanupNetworkResources(NetworkOrchestrator.java:2616)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.destroyNetwork(NetworkOrchestrator.java:2177)
>         at com.cloud.user.AccountManagerImpl.cleanupAccount(AccountManagerImpl.java:815)
>         at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
>         at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
>         at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
>         at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
>         at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy100.deleteUserAccount(Unknown Source)
>         at org.apache.cloudstack.region.RegionManagerImpl.deleteUserAccount(RegionManagerImpl.java:187)
>         at org.apache.cloudstack.region.RegionServiceImpl.deleteUserAccount(RegionServiceImpl.java:121)
>         at org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd.execute(DeleteAccountCmd.java:104)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:119)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:496)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:453)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
> =====================================
> 140513 20:07:49 INNODB MONITOR OUTPUT
> =====================================
> Per second averages calculated from the last 15 seconds
> ----------
> SEMAPHORES
> ----------
> OS WAIT ARRAY INFO: reservation count 86171, signal count 75053
> Mutex spin waits 0, rounds 1468094, OS waits 30100
> RW-shared spins 106570, OS waits 49606; RW-excl spins 8147, OS waits 6463
> ------------------------
> ------------------------
> LATEST FOREIGN KEY ERROR
> ------------------------
> 140513 18:36:59 Transaction:
> TRANSACTION 0 1892217502, ACTIVE 0 sec, process no 1449, OS thread id 140580633499392 inserting, thread declared inside InnoDB 500
> mysql tables in use 1, locked 1
> 5 lock struct(s), heap size 1216, 2 row lock(s), undo log entries 1
> MySQL thread id 3304, query id 1818700312 localhost 127.0.0.1 cloud update
> INSERT INTO op_ha_work (op_ha_work.instance_id, op_ha_work.mgmt_server_id, op_ha_work.created, op_ha_work.state, op_ha_work.host_id, op_ha_work.taken, op_ha_work.time_to_try, op_ha_work.type, op_ha_work.updated, op_ha_work.step, op_ha_work.vm_type, op_ha_work.tried) VALUES (152, null, '2014-05-14 01:36:59', 'Stopped', 0, null, 1367218183, 'HA', 3, 'Investigating', 'User', 1)
> Foreign key constraint fails for table `cloud`.`op_ha_work`:
> ,
>   CONSTRAINT `fk_op_ha_work__host_id` FOREIGN KEY (`host_id`) REFERENCES `host` (`id`)
> Trying to add in child table, in index `i_op_ha_work__host_id` tuple:
> DATA TUPLE: 2 fields;
>  0: len 8; hex 0000000000000000; asc         ;; 1: len 8; hex 000000000000001e; asc         ;;
> But in parent table `cloud`.`host`, in index `PRIMARY`,
> the closest match we can find is record:
> PHYSICAL RECORD: n_fields 50; compact format; info bits 0
>  0: len 8; hex 0000000000000001; asc         ;; 1: len 6; hex 000070c8e312; asc   p   ;; 2: len 7; hex 000000296610fd; asc    )f  ;; 3: len 25; hex 5261636b32486f737431312e6c61622e766d6f70732e636f6d; asc Rack2Host11.lab.vmops.com;; 4: len 30; hex 64333337373432652d653934642d343033302d626132392d336132653463; asc d337742e-e94d-4030-ba29-3a2e4c;...(truncated); 5: len 5; hex 416c657274; asc Alert;; 6: len 7; hex 526f7574696e67; asc Routing;; 7: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 8: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 9: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 10: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 11: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 12: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 13: SQL NULL; 14: SQL NULL; 15: SQL NULL; 16: len 8; hex 0000000000000001; asc         ;; 17: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 18: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 19: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 20: SQL NULL; 21: len 8; hex 0000000000000001; asc         ;; 22: len 8; hex 0000000000000001; asc         ;; 23: len 4; hex 00000001; asc     ;; 24: len 4; hex 00000004; asc     ;; 25: len 4; hex 000008d5; asc     ;; 26: SQL NULL; 27: SQL NULL; 28: len 3; hex 4b564d; asc KVM;; 29: SQL NULL; 30: len 8; hex 00000003e430c000; asc      0  ;; 31: SQL NULL; 32: len 14; hex 342e342e302d534e415053484f54; asc 4.4.0-SNAPSHOT;; 33: SQL NULL; 34: SQL NULL; 35: len 12; hex 68766d2c736e617073686f74; asc hvm,snapshot;; 36: len 30; hex 38313731636466622d306434372d333239632d393835382d383739633434; asc 8171cdfb-0d47-329c-9858-879c44;...(truncated); 37: len 4; hex 00000001; asc     ;; 38: len 4; hex 00000000; asc     ;; 39: len 8; hex 0000000000000000; asc         ;; 40: len 4; hex 517dfc67; asc Q} g;; 41: SQL NULL; 42: len 8; hex 8000125154230a85; asc    QT#  ;; 43: len 8; hex 80001251542304fb; asc    QT#  ;; 44: SQL NULL; 45: len 8; hex 0000000000000006; asc         ;; 46: len 7; hex 456e61626c6564; asc Enabled;; 47: SQL NULL; 48: SQL NULL; 49: len 8; hex 44697361626c6564; asc Disabled;;
> ------------------------
> LATEST DETECTED DEADLOCK
> ------------------------
> 140513 20:07:49
> *** (1) TRANSACTION:
> TRANSACTION 0 1899231769, ACTIVE 0 sec, process no 1449, OS thread id 140580624189184 starting index read
> mysql tables in use 1, locked 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)