You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Rayees Namathponnan (JIRA)" <ji...@apache.org> on 2014/05/14 20:55:15 UTC

[jira] [Created] (CLOUDSTACK-6674) [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back

Rayees Namathponnan created CLOUDSTACK-6674:
-----------------------------------------------

             Summary: [Automation] [DB lock] When KVM agent is alert state, agent never trying to connect back 
                 Key: CLOUDSTACK-6674
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6674
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: KVM
    Affects Versions: 4.4.0
         Environment: KVM (RHEL 6.3)
4.4-forward
            Reporter: Rayees Namathponnan
            Assignee: edison su
            Priority: Blocker
             Fix For: 4.4.0


This issue observed in automation run,  when KVM agent is alert state, it never tries to connect back,  also observed SQL DB lock in MS log for AgentConnectTaskPool


You need to restart agent manually to connect it back 

Here is agent log 
2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
 commands...
2014-05-13 23:59:38,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
ss.
2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
 commands...
2014-05-13 23:59:43,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
ss.
2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
 commands...
2014-05-13 23:59:48,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
ss.
2014-05-13 23:59:53,877 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
 commands...
2014-05-13 23:59:53,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progre
ss.
2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining
 commands...
2014-05-13 23:59:58,878 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect because we still have 5 commands in progress.
~


MS log 

2014-05-13 20:07:50,170 DEBUG [c.c.v.VirtualMachineManagerImpl] (API-Job-Executor-93:job-2240 ctx-f6efb16a) Destroying vm VM[User|i-137-304-QA]
2014-05-13 20:07:50,171 DEBUG [c.c.u.d.T.Transaction] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Rolling back the transaction: Time = 291 Name =  API-Job-Ex
ecutor-65; called by -TransactionLegacy.rollback:903-TransactionLegacy.removeUpTo:846-TransactionLegacy.close:670-TransactionContextInterceptor.invoke:36-Ref
lectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocation.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy97.markA
sUnavailable:-1-IpAddressManagerImpl$10.doInTransaction:1639-IpAddressManagerImpl$10.doInTransaction:1620-Transaction$2.doInTransaction:49
2014-05-13 20:07:50,173 WARN  [c.c.u.AccountManagerImpl] (API-Job-Executor-65:job-2223 ctx-a60be5ae) Failed to cleanup account Acct[20c063c7-9913-496d-983f-e
fe18a92a9dc-test-account-TestScaleVmDynamicServiceOffering-test_change_so_running_vm_dynamic_to_dynamic_1_ADMIN] due to
com.cloud.utils.exception.CloudRuntimeException: DB Exception on: com.mysql.jdbc.PreparedStatement@76384607: UPDATE user_ip_address SET user_ip_address.state
='Releasing' WHERE user_ip_address.id = 15
        at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:851)
        at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:1327)
        at com.cloud.network.dao.IPAddressDaoImpl.markAsUnavailable(IPAddressDaoImpl.java:323)
        at sun.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at com.sun.proxy.$Proxy97.markAsUnavailable(Unknown Source)
        at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1639)
        at com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1620)
        at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49)
        at com.cloud.utils.db.Transaction.execute(Transaction.java:37)
        at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
        at com.cloud.network.IpAddressManagerImpl.markIpAsUnavailable(IpAddressManagerImpl.java:1620)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.cleanupNetworkResources(NetworkOrchestrator.java:2616)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.destroyNetwork(NetworkOrchestrator.java:2177)
        at com.cloud.user.AccountManagerImpl.cleanupAccount(AccountManagerImpl.java:815)
        at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
        at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
        at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689)
        at com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464)
        at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
        at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at com.sun.proxy.$Proxy100.deleteUserAccount(Unknown Source)
        at org.apache.cloudstack.region.RegionManagerImpl.deleteUserAccount(RegionManagerImpl.java:187)
        at org.apache.cloudstack.region.RegionServiceImpl.deleteUserAccount(RegionServiceImpl.java:121)
        at org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd.execute(DeleteAccountCmd.java:104)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:119)
        at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:496)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:453)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction


=====================================
140513 20:07:49 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 86171, signal count 75053
Mutex spin waits 0, rounds 1468094, OS waits 30100
RW-shared spins 106570, OS waits 49606; RW-excl spins 8147, OS waits 6463
------------------------
------------------------
LATEST FOREIGN KEY ERROR
------------------------
140513 18:36:59 Transaction:
TRANSACTION 0 1892217502, ACTIVE 0 sec, process no 1449, OS thread id 140580633499392 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1216, 2 row lock(s), undo log entries 1
MySQL thread id 3304, query id 1818700312 localhost 127.0.0.1 cloud update
INSERT INTO op_ha_work (op_ha_work.instance_id, op_ha_work.mgmt_server_id, op_ha_work.created, op_ha_work.state, op_ha_work.host_id, op_ha_work.taken, op_ha_work.time_to_try, op_ha_work.type, op_ha_work.updated, op_ha_work.step, op_ha_work.vm_type, op_ha_work.tried) VALUES (152, null, '2014-05-14 01:36:59', 'Stopped', 0, null, 1367218183, 'HA', 3, 'Investigating', 'User', 1)
Foreign key constraint fails for table `cloud`.`op_ha_work`:
,
  CONSTRAINT `fk_op_ha_work__host_id` FOREIGN KEY (`host_id`) REFERENCES `host` (`id`)
Trying to add in child table, in index `i_op_ha_work__host_id` tuple:
DATA TUPLE: 2 fields;
 0: len 8; hex 0000000000000000; asc         ;; 1: len 8; hex 000000000000001e; asc         ;;

But in parent table `cloud`.`host`, in index `PRIMARY`,
the closest match we can find is record:
PHYSICAL RECORD: n_fields 50; compact format; info bits 0
 0: len 8; hex 0000000000000001; asc         ;; 1: len 6; hex 000070c8e312; asc   p   ;; 2: len 7; hex 000000296610fd; asc    )f  ;; 3: len 25; hex 5261636b32486f737431312e6c61622e766d6f70732e636f6d; asc Rack2Host11.lab.vmops.com;; 4: len 30; hex 64333337373432652d653934642d343033302d626132392d336132653463; asc d337742e-e94d-4030-ba29-3a2e4c;...(truncated); 5: len 5; hex 416c657274; asc Alert;; 6: len 7; hex 526f7574696e67; asc Routing;; 7: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 8: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 9: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 10: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 11: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 12: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 13: SQL NULL; 14: SQL NULL; 15: SQL NULL; 16: len 8; hex 0000000000000001; asc         ;; 17: len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc 10.223.50.66                  ;...(truncated); 18: len 15; hex 3235352e3235352e3235352e313932; asc 255.255.255.192;; 19: len 17; hex 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 20: SQL NULL; 21: len 8; hex 0000000000000001; asc         ;; 22: len 8; hex 0000000000000001; asc         ;; 23: len 4; hex 00000001; asc     ;; 24: len 4; hex 00000004; asc     ;; 25: len 4; hex 000008d5; asc     ;; 26: SQL NULL; 27: SQL NULL; 28: len 3; hex 4b564d; asc KVM;; 29: SQL NULL; 30: len 8; hex 00000003e430c000; asc      0  ;; 31: SQL NULL; 32: len 14; hex 342e342e302d534e415053484f54; asc 4.4.0-SNAPSHOT;; 33: SQL NULL; 34: SQL NULL; 35: len 12; hex 68766d2c736e617073686f74; asc hvm,snapshot;; 36: len 30; hex 38313731636466622d306434372d333239632d393835382d383739633434; asc 8171cdfb-0d47-329c-9858-879c44;...(truncated); 37: len 4; hex 00000001; asc     ;; 38: len 4; hex 00000000; asc     ;; 39: len 8; hex 0000000000000000; asc         ;; 40: len 4; hex 517dfc67; asc Q} g;; 41: SQL NULL; 42: len 8; hex 8000125154230a85; asc    QT#  ;; 43: len 8; hex 80001251542304fb; asc    QT#  ;; 44: SQL NULL; 45: len 8; hex 0000000000000006; asc         ;; 46: len 7; hex 456e61626c6564; asc Enabled;; 47: SQL NULL; 48: SQL NULL; 49: len 8; hex 44697361626c6564; asc Disabled;;

------------------------
LATEST DETECTED DEADLOCK
------------------------
140513 20:07:49
*** (1) TRANSACTION:
TRANSACTION 0 1899231769, ACTIVE 0 sec, process no 1449, OS thread id 140580624189184 starting index read
mysql tables in use 1, locked 1









--
This message was sent by Atlassian JIRA
(v6.2#6252)