You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Milamber (JIRA)" <ji...@apache.org> on 2016/01/24 00:33:39 UTC

[jira] [Created] (CLOUDSTACK-9255) Unable to start VM DomainRouter due to error in finalizeStart, not retrying

Milamber created CLOUDSTACK-9255:
------------------------------------

             Summary: Unable to start VM DomainRouter due to error in finalizeStart, not retrying
                 Key: CLOUDSTACK-9255
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9255
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Virtual Router
    Affects Versions: 4.7.0, 4.6.2, 4.8.0, 4.7.1
         Environment: Ubuntu 14.04.3
KVM
NFS (primary/secondary)
            Reporter: Milamber



I've spent 3 days with the same issue : unable to restart with clean up a network (virtual router or redondant virtual router) if the network have at least 20 virtual machines.

I've tested with CS 4.6.2, 4.7.0, 4.7.1RC1, 4.8.0RC1, same problem. I've used the system vm from apt-get.eu and last builds from jenkins.

My tests are made with hosts/mgr on Ubuntu 14.04.3 / KVM / NFS primary/secondary.

My test case (with ansible modules) :
1/ create a new network (normal or RVR)
2/ create 20 vms (same params, just the name is changes)
wait the end of creation
3/ restart the network with clean up option
4/ wait the restart, after some minutes, an error message arrived : "Failed to restart network"

The trace in management.log are:

2016-01-23 23:02:51,503 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-51:ctx-9ed51622 job-268/job-271) (logid:b9a521fa) Unable to complete AsyncJobVO {id:271, userId: 2, accountId: 2, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAMnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 146456419427, completeMsid: null, lastUpdated: null, lastPolled: null, created: Sat Jan 23 22:56:00 CET 2016}, job origin:268
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to start VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
    at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1119)
    at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
    at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
    at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4734)
    at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
    at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
    at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1083)
    at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
    at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
    ... 17 more


During the restart of the network I can connect on the VR with link local link over ssh, the last lines shows:

2016-01-23 22:02:39,780  configure.py __init__:128 AclIP created for rule ==> {'last_port': 65535, u'protocol': u'tcp', u'revoked': False, u'already_added': True, u'source_cidr_list': [u'0.0.0.0/0'], 'cidr': [u'0.0.0.0/0'], u'id': 52, u'src_ip': u'192.168.13.30', u'purpose': u'Firewall', 'allowed': True, 'action': 'ACCEPT', u'src_port_range': [1, 65535], u'traffic_type': u'Ingress', 'type': u'tcp', u'default_egress_policy': False, 'first_port': 1}
2016-01-23 22:02:39,780  configure.py add_rule:165 Current ACL IP direction is ==> ingress
2016-01-23 22:02:39,780  merge.py load:60 Loading data bag type forwardingrules

Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):

The system is going down for system halt NOW!

Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):

Power button pressed
The system is going down for system halt NOW!
/opt/cloud/bin/vr_cfg.sh: line 60: 16845 Killed                  /opt/cloud/bin/update_config.py vm_metadata.json
Sat Jan 23 22:02:46 UTC 2016 : VR config: executing failed: /opt/cloud/bin/update_config.py vm_metadata.json
Connection to 169.254.2.186 closed by remote host.
Connection to 169.254.2.186 closed.



Perhaps that was a timeout issue? if I create one VM or 10 VMs, the network restart works.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)