You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "edison su (JIRA)" <ji...@apache.org> on 2014/06/09 23:27:02 UTC

[jira] [Commented] (CLOUDSTACK-6674) [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025769#comment-14025769 ] 

edison su commented on CLOUDSTACK-6674:
---------------------------------------

KVM agent won't reconnect back, is because their are jobs running on kvm host, the current logic is that, unless all the jobs finished, kvm agent won't connect back to mgt server. Why there are jobs not finished, maybe due to storage error, such as hanging on access primary/secondary storage etc.

But there is a DB deadlock exception in the log, may need DB expert to take a look it.

> [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back 
> -----------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-6674
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6674
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM
>    Affects Versions: 4.4.0
>         Environment: KVM (RHEL 6.3)
> 4.4-forward
>            Reporter: Rayees Namathponnan
>            Assignee: edison su
>            Priority: Critical
>             Fix For: 4.4.0
>
>
> This issue observed in automation run,  when KVM agent is alert state, it never tries to connect back,  also observed SQL DB lock in MS log for AgentConnectTaskPool
> You need to restart agent manually to connect it back 
> Here is agent log 
> 2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:53,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:53,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
> ss.
> 2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
>  commands...
> 2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progress.
> ~
> MS log, please see the attached log for more info
> 2014-05-13 20:07:50,170 DEBUG [c.c.v.VirtualMachineManagerImpl] (API-Job-Executor-93:job-2240 ctx-f6efb16a) Destroying vm VM[User|i-137-304-QA]
> 2014-05-13 20:07:50,171 DEBUG [c.c.u.d.T.Transaction] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Rolling back the transaction: Time = 291 Name =  API-Job-Ex
> ecutor-65; called by -TransactionLegacy.rollback:903-TransactionLegacy.removeUpTo:846-TransactionLegacy.close:670-TransactionContextInterceptor.invoke:36-Ref
> lectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocation.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy97.markA
> sUnavailable:-1-IpAddressManagerImpl$10.doInTransaction:1639-IpAddressManagerImpl$10.doInTransaction:1620-Transaction$2.doInTransaction:49
> 2014-05-13 20:07:50,173 WARN  [c.c.u.AccountManagerImpl] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Failed to cleanup account Acct[20c063c7-9913-496d-983f-e
> fe18a92a9dc-test-account-TestScaleVmDynamicServiceOffering-test_change_so_running_vm_dynamic_to_dynamic_1_ADMIN] due to
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: com.mysql.jdbc.PreparedStatement@76384607: UPDATE user_ip_address SET user_ip_address.state
> ='Releasing' WHERE user_ip_address.id = 15
>         at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:851)
>         at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:1327)
>         at com.cloud.network.dao.IPAddressDaoImpl.markAsUnavailable(IPAddressDaoImpl.java:323)
>         at sun.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy97.markAsUnavailable(Unknown Source)
>         at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1639)
>         at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1620)
>         at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49)
>         at com.cloud.utils.db.Transaction.execute(Transaction.java:37)
>         at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
>         at com.cloud.network.IpAddressManagerImpl.markIpAsUnavailable(IpAddressManagerImpl.java:1620)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.cleanupNetworkResources(NetworkOrchestrator.java:2616)
>         at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.destroyNetwork(NetworkOrchestrator.java:2177)
>         at com.cloud.user.AccountManagerImpl.cleanupAccount(AccountManagerImpl.java:815)
>         at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
>         at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
>         at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
>         at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
>         at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy100.deleteUserAccount(Unknown Source)
>         at org.apache.cloudstack.region.RegionManagerImpl.deleteUserAccount(RegionManagerImpl.java:187)
>         at org.apache.cloudstack.region.RegionServiceImpl.deleteUserAccount(RegionServiceImpl.java:121)
>         at org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd.execute(DeleteAccountCmd.java:104)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:119)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:496)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:453)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
> =====================================
> 140513 20:07:49 INNODB MONITOR OUTPUT
> =====================================
> Per second averages calculated from the last 15 seconds
> ----------
> SEMAPHORES
> ----------
> OS WAIT ARRAY INFO: reservation count 86171, signal count 75053
> Mutex spin waits 0, rounds 1468094, OS waits 30100
> RW-shared spins 106570, OS waits 49606; RW-excl spins 8147, OS waits 6463
> ------------------------
> ------------------------
> LATEST FOREIGN KEY ERROR
> ------------------------
> 140513 18:36:59 Transaction:
> TRANSACTION 0 1892217502, ACTIVE 0 sec, process no 1449, OS thread id 140580633499392 inserting, thread declared inside InnoDB 500
> mysql tables in use 1, locked 1
> 5 lock struct(s), heap size 1216, 2 row lock(s), undo log entries 1
> MySQL thread id 3304, query id 1818700312 localhost 127.0.0.1 cloud update
> INSERT INTO op_ha_work (op_ha_work.instance_id, op_ha_work.mgmt_server_id, op_ha_work.created, op_ha_work.state, op_ha_work.host_id, op_ha_work.taken, op_ha_work.time_to_try, op_ha_work.type, op_ha_work.updated, op_ha_work.step, op_ha_work.vm_type, op_ha_work.tried) VALUES (152, null, '2014-05-14 01:36:59', 'Stopped', 0, null, 1367218183, 'HA', 3, 'Investigating', 'User', 1)
> Foreign key constraint fails for table `cloud`.`op_ha_work`:
> ,
>   CONSTRAINT `fk_op_ha_work__host_id` FOREIGN KEY (`host_id`) REFERENCES `host` (`id`)
> Trying to add in child table, in index `i_op_ha_work__host_id` tuple:
> DATA TUPLE: 2 fields;
>  0: len 8; hex 0000000000000000; asc         ;; 1: len 8; hex 000000000000001e; asc         ;;
> But in parent table `cloud`.`host`, in index `PRIMARY`,
> the closest match we can find is record:
> PHYSICAL RECORD: n_fields 50; compact format; info bits 0
>  0: len 8; hex 0000000000000001; asc         ;; 1: len 6; hex 000070c8e312; asc   p   ;; 2: len 7; hex 000000296610fd; asc    )f  ;; 3: len 25; hex 5261636b32486f737431312e6c61622e766d6f70732e636f6d; asc Rack2Host11.lab.vmops.com;; 4: len 30; hex 64333337373432652d653934642d343033302d626132392d336132653463; asc d337742e-e94d-4030-ba29-3a2e4c;...(truncated); 5: len 5; hex 416c657274; asc Alert;; 6: len 7; hex 526f7574696e67; asc Routing;; 7: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 8: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 9: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 10: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 11: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 12: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 13: SQL NULL; 14: SQL NULL; 15: SQL NULL; 16: len 8; hex 0000000000000001; asc         ;; 17: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 18: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 19: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 20: SQL NULL; 21: len 8; hex 0000000000000001; asc         ;; 22: len 8; hex 0000000000000001; asc         ;; 23: len 4; hex 00000001; asc     ;; 24: len 4; hex 00000004; asc     ;; 25: len 4; hex 000008d5; asc     ;; 26: SQL NULL; 27: SQL NULL; 28: len 3; hex 4b564d; asc KVM;; 29: SQL NULL; 30: len 8; hex 00000003e430c000; asc      0  ;; 31: SQL NULL; 32: len 14; hex 342e342e302d534e415053484f54; asc 4.4.0-SNAPSHOT;; 33: SQL NULL; 34: SQL NULL; 35: len 12; hex 68766d2c736e617073686f74; asc hvm,snapshot;; 36: len 30; hex 38313731636466622d306434372d333239632d393835382d383739633434; asc 8171cdfb-0d47-329c-9858-879c44;...(truncated); 37: len 4; hex 00000001; asc     ;; 38: len 4; hex 00000000; asc     ;; 39: len 8; hex 0000000000000000; asc         ;; 40: len 4; hex 517dfc67; asc Q} g;; 41: SQL NULL; 42: len 8; hex 8000125154230a85; asc    QT#  ;; 43: len 8; hex 80001251542304fb; asc    QT#  ;; 44: SQL NULL; 45: len 8; hex 0000000000000006; asc         ;; 46: len 7; hex 456e61626c6564; asc Enabled;; 47: SQL NULL; 48: SQL NULL; 49: len 8; hex 44697361626c6564; asc Disabled;;
> ------------------------
> LATEST DETECTED DEADLOCK
> ------------------------
> 140513 20:07:49
> *** (1) TRANSACTION:
> TRANSACTION 0 1899231769, ACTIVE 0 sec, process no 1449, OS thread id 140580624189184 starting index read
> mysql tables in use 1, locked 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)