You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Milamber (JIRA)" <ji...@apache.org> on 2016/03/20 15:56:33 UTC
[jira] [Closed] (CLOUDSTACK-9255) Unable to start VM DomainRouter due to error in finalizeStart, not retrying

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milamber closed CLOUDSTACK-9255.
--------------------------------
       Resolution: Fixed
         Assignee: Wilder Rodrigues
    Fix Version/s: 4.9.0
                   4.7.2

> Unable to start VM DomainRouter due to error in finalizeStart, not retrying
> ---------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9255
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9255
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Virtual Router
>    Affects Versions: 4.7.0, 4.6.2, 4.8.0, 4.7.1
>         Environment: Ubuntu 14.04.3
> KVM
> NFS (primary/secondary)
>            Reporter: Milamber
>            Assignee: Wilder Rodrigues
>             Fix For: 4.7.2, 4.9.0
>
>         Attachments: anon-rvr-2nd-after-20.log
>
>
> I've spent 3 days with the same issue : unable to restart with clean up a network (virtual router or redondant virtual router) if the network have at least 20 virtual machines.
> I've tested with CS 4.6.2, 4.7.0, 4.7.1RC1, 4.8.0RC1, same problem. I've used the system vm from apt-get.eu and last builds from jenkins.
> My tests are made with hosts/mgr on Ubuntu 14.04.3 / KVM / NFS primary/secondary.
> My test case (with ansible modules) :
> 1/ create a new network (normal or RVR)
> 2/ create 20 vms (same params, just the name is changes)
> wait the end of creation
> 3/ restart the network with clean up option
> 4/ wait the restart, after some minutes, an error message arrived : "Failed to restart network"
> The trace in management.log are:
> 2016-01-23 23:02:51,503 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-51:ctx-9ed51622 job-268/job-271) (logid:b9a521fa) Unable to complete AsyncJobVO {id:271, userId: 2, accountId: 2, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAMnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 146456419427, completeMsid: null, lastUpdated: null, lastPolled: null, created: Sat Jan 23 22:56:00 CET 2016}, job origin:268
> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to start VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
>     at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1119)
>     at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
>     at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>     at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4734)
>     at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
>     at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
>     at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>     at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>     at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>     at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>     at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>     at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
>     at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1083)
>     at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
>     at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
>     ... 17 more
> During the restart of the network I can connect on the VR with link local link over ssh, the last lines shows:
> 2016-01-23 22:02:39,780  configure.py __init__:128 AclIP created for rule ==> {'last_port': 65535, u'protocol': u'tcp', u'revoked': False, u'already_added': True, u'source_cidr_list': [u'0.0.0.0/0'], 'cidr': [u'0.0.0.0/0'], u'id': 52, u'src_ip': u'192.168.13.30', u'purpose': u'Firewall', 'allowed': True, 'action': 'ACCEPT', u'src_port_range': [1, 65535], u'traffic_type': u'Ingress', 'type': u'tcp', u'default_egress_policy': False, 'first_port': 1}
> 2016-01-23 22:02:39,780  configure.py add_rule:165 Current ACL IP direction is ==> ingress
> 2016-01-23 22:02:39,780  merge.py load:60 Loading data bag type forwardingrules
> Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):
> The system is going down for system halt NOW!
> Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):
> Power button pressed
> The system is going down for system halt NOW!
> /opt/cloud/bin/vr_cfg.sh: line 60: 16845 Killed                  /opt/cloud/bin/update_config.py vm_metadata.json
> Sat Jan 23 22:02:46 UTC 2016 : VR config: executing failed: /opt/cloud/bin/update_config.py vm_metadata.json
> Connection to 169.254.2.186 closed by remote host.
> Connection to 169.254.2.186 closed.
> Perhaps that was a timeout issue? if I create one VM or 10 VMs, the network restart works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)