You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Andrei Mikhailovsky <an...@arhont.com> on 2015/12/18 02:55:00 UTC

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual routers out of 22 in total. They are all failing exactly the same way as described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

----- Original Message -----
> From: "Stephan Seitz" <s....@secretresearchfacility.com>
> To: "users" <us...@cloudstack.apache.org>
> Sent: Monday, 30 November, 2015 19:53:57
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Does anybody else experiemce problems due to (very) slow deployment of
> VRs?
> 
> 
> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>> Update / FYI:
>> After faking the particular VRu in sql, I tried to restart that
>> network,
>> and it always fails. To me it looks like the update_config.py - which
>> takes almost all cpu ressources - runs way longer any watchdog will
>> accept.
>> 
>> I'm able to mitigate that by very nasty workarounds:
>> a) start the router
>> b) wait until its provisioned
>> c) restart cloudstack-management
>> d)  update vm_instance
>> set state='Running',
>> power_state='PowerOn' where name = 'r-XXX-VM';
>> e) once: update domain_router
>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>> 2015",
>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>> where id=XXX;
>> f) wait until update_config.py finishes (for me thats about 15
>> minutes)
>> 
>> Since I expect the need for VR restarts in the future, this behaviour
>> is
>> somehow unsatisfying. It needs a lot of errorprone intervention.
>> 
>> I'm quite unsure if it's introduced with the update or the particular
>> VR
>> just has simply not been restarted after getting configured with lots
>> of
>> ips and rules.
>> 
>> 
>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>> > Hi List!
>> > 
>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>> > rules
>> > defined. During the upgrade process, and after about 4-5 minutes a
>> > watchdog kicks in and kills the respective VR due to no response.
>> > 
>> > So far I didn't find any timeout value in the global settings.
>> > Temporarily setting network.router.EnableServiceMonitoring to false
>> > doesn't change the behaviour.
>> > 
>> > Any help, how to mitigate that nasty timeout would be really
>> > appreciated :)
>> > 
>> > cheers,
>> > 
>> > Stephan
>> > 
>> > From within the VR, the logs show
>> > 
>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>> > dhcp-range=interface:eth0,set:interface and replacing with
>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>> > guestnetwork
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,15 and replacing with
>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,6 and replacing with
>> > dhcp-option=tag:interface
>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,3, and replacing with
>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,1, and replacing with
>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>> > dnsmasq restart
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>> > halt
>> > 
>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>> > 
>> > The system is going down for system halt NOW!
>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>> > 
>> > ==> /var/log/cloud.log <==
>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>> > Killed                  /opt/cloud/bin/update_config.py
>> > vm_dhcp_entry.json
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > 
>> > ==> /var/log/cloud.log <==
>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > Connection to 169.254.2.192 closed by remote host.
>> > Connection to 169.254.2.192 closed.
>> > 
>> > 
>> > the management-server.log shows
>> > 
>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>> > com.cloud.vm.VmWorkStart for job-5164
>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>> > -5164
>> > from job monitoring
>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>> > while
>> > executing org.apache.cloudstack.api.command.admin.
>> > router.StartRouterCmd
>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>> > unreachable: Host 1: Unable to start instance due to Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in f
>> > inalizeStart, not retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1121)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         at
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>> > l.java:57)
>> >         at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>> > essorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>> >         at
>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>> > Proxy.java:107)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>> > neManagerImpl.java:4736)
>> >         at
>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>> > 2)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>> > $1.run(ManagedContextRunnable.java:49)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>> > $1.call(DefaultManagedContext.java:56)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>> > llWithContext(DefaultManagedContext.java:103)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>> > nWithContext(DefaultManagedContext.java:53)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>> > nagedContextRunnable.java:46)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.run(AsyncJobManagerImpl.java:494)
>> >         at java.util.concurrent.Executors
>> > $RunnableAdapter.call(Executors.java:471)
>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >         at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>> > r.java:1145)
>> >         at java.util.concurrent.ThreadPoolExecutor
>> > $Worker.run(ThreadPoolExecutor.java:615)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>> > retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1085)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         ... 18 more
>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>> > jobStatus: FAILED, resultCode: 530, result: org.
>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>> > Host 1: Unable to start instance due to Unable t
>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>> > not
>> > retrying"}
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Hi Remi,

Is this patched merged into 4.7.1 or 4.8.0, which was recently released? I am planning to do the upgrade and wanted to double check.

Thanks

Andrei
----- Original Message -----
> From: "Remi Bergsma" <RB...@schubergphilis.com>
> To: "dev" <de...@cloudstack.apache.org>
> Sent: Tuesday, 5 January, 2016 11:20:31
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Hi Andrei,
> 
> You indeed need to build CloudStack for this to work.
> 
> You can create packages with ./packaging/package.sh script in the source tree.
> The PR is against 4.7 and when you create RPMs those will be 4.7.1-SHAPSHOT. I
> do run this in production and it resolved the issue. Let me know if it works
> for you too.
> 
> Regards,
> Remi
> 
> 
> 
> 
> On 05/01/16 10:07, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
> 
>>Hi Remi,
>>
>>I've not tried the patch. I've missed it. Do I need to rebuild the ACS to apply
>>the patch or would making changes to the two files suffice?
>>
>>Thanks
>>
>>Andrei
>>----- Original Message -----
>>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>>> To: "dev" <de...@cloudstack.apache.org>
>>> Sent: Tuesday, 5 January, 2016 05:49:05
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>
>>> Hi Andrei,
>>> 
>>> Did you try it in combination with the patch I created (PR1291)? You need both
>>> changes.
>>> 
>>> Regards, Remi
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <an...@arhont.com> wrote:
>>>> 
>>>> Hi Remi,
>>>> 
>>>> Thanks for your reply. However, your suggestion of increasing the
>>>> router.aggregation.command.each.timeout didn't help. I've tried setting the
>>>> value to 120 at no avail. Still fails with the same error.
>>>> 
>>>> Andrei
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>>>>> To: "dev" <de...@cloudstack.apache.org>
>>>>> Sent: Monday, 4 January, 2016 10:44:43
>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>> 
>>>>> Hi Andrei,
>>>>> 
>>>>> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
>>>>> 
>>>>> Also, you can bump the router.aggregation.command.each.timeout global setting to
>>>>> say 15-30 so it will allow to boot.
>>>>> 
>>>>> Next, we need to find why it takes so long in the first place. In our
>>>>> environment it at least starts now.
>>>>> 
>>>>> Regards,
>>>>> Remi
>>>>> 
>>>>> [1] https://github.com/apache/cloudstack/pull/1291
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
>>>>>> 
>>>>>> Hello guys,
>>>>>> 
>>>>>> Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>>>>> this issue is being looked at for the next release?
>>>>>> 
>>>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>>>>> out of 22 in total. They are all failing exactly the same way as described
>>>>>> here.
>>>>>> 
>>>>>> Has anyone found a permanent workaround for this issue?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Andrei
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>>>>> To: "users" <us...@cloudstack.apache.org>
>>>>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>>>> 
>>>>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>>>>> VRs?
>>>>>>> 
>>>>>>> 
>>>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>>>>> Update / FYI:
>>>>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>>>>> network,
>>>>>>>> and it always fails. To me it looks like the update_config.py - which
>>>>>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>>>>>> accept.
>>>>>>>> 
>>>>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>>>>> a) start the router
>>>>>>>> b) wait until its provisioned
>>>>>>>> c) restart cloudstack-management
>>>>>>>> d)  update vm_instance
>>>>>>>> set state='Running',
>>>>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>>>>> e) once: update domain_router
>>>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>>>>> 2015",
>>>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>>>>> where id=XXX;
>>>>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>>>>> minutes)
>>>>>>>> 
>>>>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>>>>> is
>>>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>>>> 
>>>>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>>>>> VR
>>>>>>>> just has simply not been restarted after getting configured with lots
>>>>>>>> of
>>>>>>>> ips and rules.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>>>>>> Hi List!
>>>>>>>>> 
>>>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>>>>>> rules
>>>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>>>>> 
>>>>>>>>> So far I didn't find any timeout value in the global settings.
>>>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>>>>>> doesn't change the behaviour.
>>>>>>>>> 
>>>>>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>>>>>> appreciated :)
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> 
>>>>>>>>> Stephan
>>>>>>>>> 
>>>>>>>>> From within the VR, the logs show
>>>>>>>>> 
>>>>>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>>>>>> guestnetwork
>>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>>>>>> dhcp-option=tag:interface
>>>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>>>>>> dnsmasq restart
>>>>>>>>> 
>>>>>>>>> ==> /var/log/messages <==
>>>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>>>>>> halt
>>>>>>>>> 
>>>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>>>>> 
>>>>>>>>> The system is going down for system halt NOW!
>>>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>>>>> 
>>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>>>>>> vm_dhcp_entry.json
>>>>>>>>> 
>>>>>>>>> ==> /var/log/messages <==
>>>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>>> 
>>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>>>>>> Connection to 169.254.2.192 closed.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> the management-server.log shows
>>>>>>>>> 
>>>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>>>>>> -5164
>>>>>>>>> from job monitoring
>>>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>>>>>> while
>>>>>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>>>>>> router.StartRouterCmd
>>>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>>>>>> start
>>>>>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>>>>>> inalizeStart, not retrying
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:1121)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>>> Method)
>>>>>>>>>        at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>>>>>> l.java:57)
>>>>>>>>>        at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>>>>>> essorImpl.java:43)
>>>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>>>>>> Proxy.java:107)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>>>>>> neManagerImpl.java:4736)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>>>>>> 2)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>>>>>> nagedContextRunnable.java:46)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>>>>>        at java.util.concurrent.Executors
>>>>>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>>>        at
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>>>>>> r.java:1145)
>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor
>>>>>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>>>>>> start
>>>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>>>>>> retrying
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:1085)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>>> Method)
>>>>>>>>>        ... 18 more
>>>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>>>>>> Host 1: Unable to start instance due to Unable t
>>>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>>>>>> not
>>>>>>>>> retrying"}
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Remi Bergsma <RB...@schubergphilis.com>.
Hi Andrei,

You indeed need to build CloudStack for this to work.

You can create packages with ./packaging/package.sh script in the source tree. The PR is against 4.7 and when you create RPMs those will be 4.7.1-SHAPSHOT. I do run this in production and it resolved the issue. Let me know if it works for you too.

Regards,
Remi




On 05/01/16 10:07, "Andrei Mikhailovsky" <an...@arhont.com> wrote:

>Hi Remi,
>
>I've not tried the patch. I've missed it. Do I need to rebuild the ACS to apply the patch or would making changes to the two files suffice?
>
>Thanks
>
>Andrei
>----- Original Message -----
>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>> To: "dev" <de...@cloudstack.apache.org>
>> Sent: Tuesday, 5 January, 2016 05:49:05
>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
>> Hi Andrei,
>> 
>> Did you try it in combination with the patch I created (PR1291)? You need both
>> changes.
>> 
>> Regards, Remi
>> 
>> Sent from my iPhone
>> 
>>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <an...@arhont.com> wrote:
>>> 
>>> Hi Remi,
>>> 
>>> Thanks for your reply. However, your suggestion of increasing the
>>> router.aggregation.command.each.timeout didn't help. I've tried setting the
>>> value to 120 at no avail. Still fails with the same error.
>>> 
>>> Andrei
>>> 
>>> ----- Original Message -----
>>>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>>>> To: "dev" <de...@cloudstack.apache.org>
>>>> Sent: Monday, 4 January, 2016 10:44:43
>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>> 
>>>> Hi Andrei,
>>>> 
>>>> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
>>>> 
>>>> Also, you can bump the router.aggregation.command.each.timeout global setting to
>>>> say 15-30 so it will allow to boot.
>>>> 
>>>> Next, we need to find why it takes so long in the first place. In our
>>>> environment it at least starts now.
>>>> 
>>>> Regards,
>>>> Remi
>>>> 
>>>> [1] https://github.com/apache/cloudstack/pull/1291
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
>>>>> 
>>>>> Hello guys,
>>>>> 
>>>>> Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>>>> this issue is being looked at for the next release?
>>>>> 
>>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>>>> out of 22 in total. They are all failing exactly the same way as described
>>>>> here.
>>>>> 
>>>>> Has anyone found a permanent workaround for this issue?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Andrei
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>>>> To: "users" <us...@cloudstack.apache.org>
>>>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>>> 
>>>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>>>> VRs?
>>>>>> 
>>>>>> 
>>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>>>> Update / FYI:
>>>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>>>> network,
>>>>>>> and it always fails. To me it looks like the update_config.py - which
>>>>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>>>>> accept.
>>>>>>> 
>>>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>>>> a) start the router
>>>>>>> b) wait until its provisioned
>>>>>>> c) restart cloudstack-management
>>>>>>> d)  update vm_instance
>>>>>>> set state='Running',
>>>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>>>> e) once: update domain_router
>>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>>>> 2015",
>>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>>>> where id=XXX;
>>>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>>>> minutes)
>>>>>>> 
>>>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>>>> is
>>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>>> 
>>>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>>>> VR
>>>>>>> just has simply not been restarted after getting configured with lots
>>>>>>> of
>>>>>>> ips and rules.
>>>>>>> 
>>>>>>> 
>>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>>>>> Hi List!
>>>>>>>> 
>>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>>>>> rules
>>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>>>> 
>>>>>>>> So far I didn't find any timeout value in the global settings.
>>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>>>>> doesn't change the behaviour.
>>>>>>>> 
>>>>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>>>>> appreciated :)
>>>>>>>> 
>>>>>>>> cheers,
>>>>>>>> 
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> From within the VR, the logs show
>>>>>>>> 
>>>>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>>>>> guestnetwork
>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>>>>> dhcp-option=tag:interface
>>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>>>>> dnsmasq restart
>>>>>>>> 
>>>>>>>> ==> /var/log/messages <==
>>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>>>>> halt
>>>>>>>> 
>>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>>>> 
>>>>>>>> The system is going down for system halt NOW!
>>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>>>> 
>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>>>>> vm_dhcp_entry.json
>>>>>>>> 
>>>>>>>> ==> /var/log/messages <==
>>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>> 
>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>>>>> Connection to 169.254.2.192 closed.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> the management-server.log shows
>>>>>>>> 
>>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>>>>> -5164
>>>>>>>> from job monitoring
>>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>>>>> while
>>>>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>>>>> router.StartRouterCmd
>>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>>>>> start
>>>>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>>>>> inalizeStart, not retrying
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>> ineManagerImpl.java:1121)
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>>        at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>>>>> l.java:57)
>>>>>>>>        at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>>>>> essorImpl.java:43)
>>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>>>>> Proxy.java:107)
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>>>>> neManagerImpl.java:4736)
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>>>>> 2)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>>>>> nagedContextRunnable.java:46)
>>>>>>>>        at
>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>>>>        at java.util.concurrent.Executors
>>>>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>>        at
>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>>>>> r.java:1145)
>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor
>>>>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>>>>> start
>>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>>>>> retrying
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>> ineManagerImpl.java:1085)
>>>>>>>>        at
>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>>        ... 18 more
>>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>>>>> Host 1: Unable to start instance due to Unable t
>>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>>>>> not
>>>>>>>> retrying"}
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Hi Remi,

I've not tried the patch. I've missed it. Do I need to rebuild the ACS to apply the patch or would making changes to the two files suffice?

Thanks

Andrei
----- Original Message -----
> From: "Remi Bergsma" <RB...@schubergphilis.com>
> To: "dev" <de...@cloudstack.apache.org>
> Sent: Tuesday, 5 January, 2016 05:49:05
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Hi Andrei,
> 
> Did you try it in combination with the patch I created (PR1291)? You need both
> changes.
> 
> Regards, Remi
> 
> Sent from my iPhone
> 
>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <an...@arhont.com> wrote:
>> 
>> Hi Remi,
>> 
>> Thanks for your reply. However, your suggestion of increasing the
>> router.aggregation.command.each.timeout didn't help. I've tried setting the
>> value to 120 at no avail. Still fails with the same error.
>> 
>> Andrei
>> 
>> ----- Original Message -----
>>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>>> To: "dev" <de...@cloudstack.apache.org>
>>> Sent: Monday, 4 January, 2016 10:44:43
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>> 
>>> Hi Andrei,
>>> 
>>> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
>>> 
>>> Also, you can bump the router.aggregation.command.each.timeout global setting to
>>> say 15-30 so it will allow to boot.
>>> 
>>> Next, we need to find why it takes so long in the first place. In our
>>> environment it at least starts now.
>>> 
>>> Regards,
>>> Remi
>>> 
>>> [1] https://github.com/apache/cloudstack/pull/1291
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
>>>> 
>>>> Hello guys,
>>>> 
>>>> Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>>> this issue is being looked at for the next release?
>>>> 
>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>>> out of 22 in total. They are all failing exactly the same way as described
>>>> here.
>>>> 
>>>> Has anyone found a permanent workaround for this issue?
>>>> 
>>>> Thanks
>>>> 
>>>> Andrei
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>>> To: "users" <us...@cloudstack.apache.org>
>>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>> 
>>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>>> VRs?
>>>>> 
>>>>> 
>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>>> Update / FYI:
>>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>>> network,
>>>>>> and it always fails. To me it looks like the update_config.py - which
>>>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>>>> accept.
>>>>>> 
>>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>>> a) start the router
>>>>>> b) wait until its provisioned
>>>>>> c) restart cloudstack-management
>>>>>> d)  update vm_instance
>>>>>> set state='Running',
>>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>>> e) once: update domain_router
>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>>> 2015",
>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>>> where id=XXX;
>>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>>> minutes)
>>>>>> 
>>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>>> is
>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>> 
>>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>>> VR
>>>>>> just has simply not been restarted after getting configured with lots
>>>>>> of
>>>>>> ips and rules.
>>>>>> 
>>>>>> 
>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>>>> Hi List!
>>>>>>> 
>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>>>> rules
>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>>> 
>>>>>>> So far I didn't find any timeout value in the global settings.
>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>>>> doesn't change the behaviour.
>>>>>>> 
>>>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>>>> appreciated :)
>>>>>>> 
>>>>>>> cheers,
>>>>>>> 
>>>>>>> Stephan
>>>>>>> 
>>>>>>> From within the VR, the logs show
>>>>>>> 
>>>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>>>> guestnetwork
>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>>>> dhcp-option=tag:interface
>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>>>> dnsmasq restart
>>>>>>> 
>>>>>>> ==> /var/log/messages <==
>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>>>> halt
>>>>>>> 
>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>>> 
>>>>>>> The system is going down for system halt NOW!
>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>>> 
>>>>>>> ==> /var/log/cloud.log <==
>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>>>> vm_dhcp_entry.json
>>>>>>> 
>>>>>>> ==> /var/log/messages <==
>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>> 
>>>>>>> ==> /var/log/cloud.log <==
>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>>>> Connection to 169.254.2.192 closed.
>>>>>>> 
>>>>>>> 
>>>>>>> the management-server.log shows
>>>>>>> 
>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>>>> -5164
>>>>>>> from job monitoring
>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>>>> while
>>>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>>>> router.StartRouterCmd
>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>>>> start
>>>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>>>> inalizeStart, not retrying
>>>>>>>        at
>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>> ineManagerImpl.java:1121)
>>>>>>>        at
>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>> ineManagerImpl.java:4580)
>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>        at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>>>> l.java:57)
>>>>>>>        at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>>>> essorImpl.java:43)
>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>        at
>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>>>> Proxy.java:107)
>>>>>>>        at
>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>>>> neManagerImpl.java:4736)
>>>>>>>        at
>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>>>> 2)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>>>> nagedContextRunnable.java:46)
>>>>>>>        at
>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>>>        at java.util.concurrent.Executors
>>>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>        at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>>>> r.java:1145)
>>>>>>>        at java.util.concurrent.ThreadPoolExecutor
>>>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>>>> start
>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>>>> retrying
>>>>>>>        at
>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>> ineManagerImpl.java:1085)
>>>>>>>        at
>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>> ineManagerImpl.java:4580)
>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>        ... 18 more
>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>>>> Host 1: Unable to start instance due to Unable t
>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>>>> not
>>>>>>> retrying"}
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Remi Bergsma <RB...@schubergphilis.com>.
Hi Andrei,

Did you try it in combination with the patch I created (PR1291)? You need both changes. 

Regards, Remi 

Sent from my iPhone

> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <an...@arhont.com> wrote:
> 
> Hi Remi,
> 
> Thanks for your reply. However, your suggestion of increasing the router.aggregation.command.each.timeout didn't help. I've tried setting the value to 120 at no avail. Still fails with the same error.
> 
> Andrei
> 
> ----- Original Message -----
>> From: "Remi Bergsma" <RB...@schubergphilis.com>
>> To: "dev" <de...@cloudstack.apache.org>
>> Sent: Monday, 4 January, 2016 10:44:43
>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
> 
>> Hi Andrei,
>> 
>> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
>> 
>> Also, you can bump the router.aggregation.command.each.timeout global setting to
>> say 15-30 so it will allow to boot.
>> 
>> Next, we need to find why it takes so long in the first place. In our
>> environment it at least starts now.
>> 
>> Regards,
>> Remi
>> 
>> [1] https://github.com/apache/cloudstack/pull/1291
>> 
>> 
>> 
>> 
>> 
>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
>>> 
>>> Hello guys,
>>> 
>>> Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>> this issue is being looked at for the next release?
>>> 
>>> I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>> out of 22 in total. They are all failing exactly the same way as described
>>> here.
>>> 
>>> Has anyone found a permanent workaround for this issue?
>>> 
>>> Thanks
>>> 
>>> Andrei
>>> 
>>> ----- Original Message -----
>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>> To: "users" <us...@cloudstack.apache.org>
>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>> 
>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>> VRs?
>>>> 
>>>> 
>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>> Update / FYI:
>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>> network,
>>>>> and it always fails. To me it looks like the update_config.py - which
>>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>>> accept.
>>>>> 
>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>> a) start the router
>>>>> b) wait until its provisioned
>>>>> c) restart cloudstack-management
>>>>> d)  update vm_instance
>>>>> set state='Running',
>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>> e) once: update domain_router
>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>> 2015",
>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>> where id=XXX;
>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>> minutes)
>>>>> 
>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>> is
>>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>> 
>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>> VR
>>>>> just has simply not been restarted after getting configured with lots
>>>>> of
>>>>> ips and rules.
>>>>> 
>>>>> 
>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>>> Hi List!
>>>>>> 
>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>>> rules
>>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>> 
>>>>>> So far I didn't find any timeout value in the global settings.
>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>>> doesn't change the behaviour.
>>>>>> 
>>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>>> appreciated :)
>>>>>> 
>>>>>> cheers,
>>>>>> 
>>>>>> Stephan
>>>>>> 
>>>>>> From within the VR, the logs show
>>>>>> 
>>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>>> guestnetwork
>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>>> dhcp-option=tag:interface
>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>>> dnsmasq restart
>>>>>> 
>>>>>> ==> /var/log/messages <==
>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>>> halt
>>>>>> 
>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>> 
>>>>>> The system is going down for system halt NOW!
>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>> 
>>>>>> ==> /var/log/cloud.log <==
>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>>> vm_dhcp_entry.json
>>>>>> 
>>>>>> ==> /var/log/messages <==
>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>> 
>>>>>> ==> /var/log/cloud.log <==
>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>>> Connection to 169.254.2.192 closed.
>>>>>> 
>>>>>> 
>>>>>> the management-server.log shows
>>>>>> 
>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>>> -5164
>>>>>> from job monitoring
>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>>> while
>>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>>> router.StartRouterCmd
>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>>> start
>>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>>> inalizeStart, not retrying
>>>>>>        at
>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>> ineManagerImpl.java:1121)
>>>>>>        at
>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>> ineManagerImpl.java:4580)
>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>>        at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>>> l.java:57)
>>>>>>        at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>>> essorImpl.java:43)
>>>>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>        at
>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>>> Proxy.java:107)
>>>>>>        at
>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>>> neManagerImpl.java:4736)
>>>>>>        at
>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>>> 2)
>>>>>>        at
>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>>        at
>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>>        at
>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>>        at
>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>>        at
>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>>        at
>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>>> nagedContextRunnable.java:46)
>>>>>>        at
>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>>        at java.util.concurrent.Executors
>>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>        at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>>> r.java:1145)
>>>>>>        at java.util.concurrent.ThreadPoolExecutor
>>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>>> start
>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>>> retrying
>>>>>>        at
>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>> ineManagerImpl.java:1085)
>>>>>>        at
>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>> ineManagerImpl.java:4580)
>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>>        ... 18 more
>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>>> Host 1: Unable to start instance due to Unable t
>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>>> not
>>>>>> retrying"}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Hi Remi,

Thanks for your reply. However, your suggestion of increasing the router.aggregation.command.each.timeout didn't help. I've tried setting the value to 120 at no avail. Still fails with the same error.

Andrei

----- Original Message -----
> From: "Remi Bergsma" <RB...@schubergphilis.com>
> To: "dev" <de...@cloudstack.apache.org>
> Sent: Monday, 4 January, 2016 10:44:43
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Hi Andrei,
> 
> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
> 
> Also, you can bump the router.aggregation.command.each.timeout global setting to
> say 15-30 so it will allow to boot.
> 
> Next, we need to find why it takes so long in the first place. In our
> environment it at least starts now.
> 
> Regards,
> Remi
> 
> [1] https://github.com/apache/cloudstack/pull/1291
> 
> 
> 
> 
> 
> On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:
> 
>>Hello guys,
>>
>>Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>this issue is being looked at for the next release?
>>
>>I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>out of 22 in total. They are all failing exactly the same way as described
>>here.
>>
>>Has anyone found a permanent workaround for this issue?
>>
>>Thanks
>>
>>Andrei
>>
>>----- Original Message -----
>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>> To: "users" <us...@cloudstack.apache.org>
>>> Sent: Monday, 30 November, 2015 19:53:57
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>
>>> Does anybody else experiemce problems due to (very) slow deployment of
>>> VRs?
>>> 
>>> 
>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>> Update / FYI:
>>>> After faking the particular VRu in sql, I tried to restart that
>>>> network,
>>>> and it always fails. To me it looks like the update_config.py - which
>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>> accept.
>>>> 
>>>> I'm able to mitigate that by very nasty workarounds:
>>>> a) start the router
>>>> b) wait until its provisioned
>>>> c) restart cloudstack-management
>>>> d)  update vm_instance
>>>> set state='Running',
>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>> e) once: update domain_router
>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>> 2015",
>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>> where id=XXX;
>>>> f) wait until update_config.py finishes (for me thats about 15
>>>> minutes)
>>>> 
>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>> is
>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>> 
>>>> I'm quite unsure if it's introduced with the update or the particular
>>>> VR
>>>> just has simply not been restarted after getting configured with lots
>>>> of
>>>> ips and rules.
>>>> 
>>>> 
>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>> > Hi List!
>>>> > 
>>>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>> > rules
>>>> > defined. During the upgrade process, and after about 4-5 minutes a
>>>> > watchdog kicks in and kills the respective VR due to no response.
>>>> > 
>>>> > So far I didn't find any timeout value in the global settings.
>>>> > Temporarily setting network.router.EnableServiceMonitoring to false
>>>> > doesn't change the behaviour.
>>>> > 
>>>> > Any help, how to mitigate that nasty timeout would be really
>>>> > appreciated :)
>>>> > 
>>>> > cheers,
>>>> > 
>>>> > Stephan
>>>> > 
>>>> > From within the VR, the logs show
>>>> > 
>>>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>> > dhcp-range=interface:eth0,set:interface and replacing with
>>>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>> > guestnetwork
>>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,15 and replacing with
>>>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,6 and replacing with
>>>> > dhcp-option=tag:interface
>>>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,3, and replacing with
>>>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,1, and replacing with
>>>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>> > dnsmasq restart
>>>> > 
>>>> > ==> /var/log/messages <==
>>>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>> > halt
>>>> > 
>>>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>> > 
>>>> > The system is going down for system halt NOW!
>>>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>> > 
>>>> > ==> /var/log/cloud.log <==
>>>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>> > Killed                  /opt/cloud/bin/update_config.py
>>>> > vm_dhcp_entry.json
>>>> > 
>>>> > ==> /var/log/messages <==
>>>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> > 
>>>> > ==> /var/log/cloud.log <==
>>>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> > Connection to 169.254.2.192 closed by remote host.
>>>> > Connection to 169.254.2.192 closed.
>>>> > 
>>>> > 
>>>> > the management-server.log shows
>>>> > 
>>>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>> > com.cloud.vm.VmWorkStart for job-5164
>>>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>> > -5164
>>>> > from job monitoring
>>>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>> > while
>>>> > executing org.apache.cloudstack.api.command.admin.
>>>> > router.StartRouterCmd
>>>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>> > unreachable: Host 1: Unable to start instance due to Unable to
>>>> > start
>>>> > VM[DomainRouter|r-504-VM] due to error in f
>>>> > inalizeStart, not retrying
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:1121)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:4580)
>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> > Method)
>>>> >         at
>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>> > l.java:57)
>>>> >         at
>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>> > essorImpl.java:43)
>>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>>> >         at
>>>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>> > Proxy.java:107)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>> > neManagerImpl.java:4736)
>>>> >         at
>>>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>> > 2)
>>>> >         at
>>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>> > $1.run(ManagedContextRunnable.java:49)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>> > $1.call(DefaultManagedContext.java:56)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>> > llWithContext(DefaultManagedContext.java:103)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>> > nWithContext(DefaultManagedContext.java:53)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>> > nagedContextRunnable.java:46)
>>>> >         at
>>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> > $5.run(AsyncJobManagerImpl.java:494)
>>>> >         at java.util.concurrent.Executors
>>>> > $RunnableAdapter.call(Executors.java:471)
>>>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> >         at
>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>> > r.java:1145)
>>>> >         at java.util.concurrent.ThreadPoolExecutor
>>>> > $Worker.run(ThreadPoolExecutor.java:615)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>> > start
>>>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>> > retrying
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:1085)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:4580)
>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> > Method)
>>>> >         ... 18 more
>>>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>> > jobStatus: FAILED, resultCode: 530, result: org.
>>>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>> > Host 1: Unable to start instance due to Unable t
>>>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>> > not
>>>> > retrying"}
>>>> > 
>>>> > 
>>>> > 
>>>> > 
>>>> > 
>>>> > 

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Remi Bergsma <RB...@schubergphilis.com>.
Hi Andrei,

Missed that mail, sorry. I created a PR that allows for longer timeouts [1].

Also, you can bump the router.aggregation.command.each.timeout global setting to say 15-30 so it will allow to boot.

Next, we need to find why it takes so long in the first place. In our environment it at least starts now.

Regards,
Remi

[1] https://github.com/apache/cloudstack/pull/1291





On 04/01/16 11:41, "Andrei Mikhailovsky" <an...@arhont.com> wrote:

>Hello guys,
>
>Tried the user's mailing list without any luck. Perhaps the dev guys know if this issue is being looked at for the next release?
>
>I've just upgraded to 4.6.2 and have similar issues with three virtual routers out of 22 in total. They are all failing exactly the same way as described here.
>
>Has anyone found a permanent workaround for this issue?
>
>Thanks
>
>Andrei
>
>----- Original Message -----
>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>> To: "users" <us...@cloudstack.apache.org>
>> Sent: Monday, 30 November, 2015 19:53:57
>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
>> Does anybody else experiemce problems due to (very) slow deployment of
>> VRs?
>> 
>> 
>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>> Update / FYI:
>>> After faking the particular VRu in sql, I tried to restart that
>>> network,
>>> and it always fails. To me it looks like the update_config.py - which
>>> takes almost all cpu ressources - runs way longer any watchdog will
>>> accept.
>>> 
>>> I'm able to mitigate that by very nasty workarounds:
>>> a) start the router
>>> b) wait until its provisioned
>>> c) restart cloudstack-management
>>> d)  update vm_instance
>>> set state='Running',
>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>> e) once: update domain_router
>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>> 2015",
>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>> where id=XXX;
>>> f) wait until update_config.py finishes (for me thats about 15
>>> minutes)
>>> 
>>> Since I expect the need for VR restarts in the future, this behaviour
>>> is
>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>> 
>>> I'm quite unsure if it's introduced with the update or the particular
>>> VR
>>> just has simply not been restarted after getting configured with lots
>>> of
>>> ips and rules.
>>> 
>>> 
>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>> > Hi List!
>>> > 
>>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>> > rules
>>> > defined. During the upgrade process, and after about 4-5 minutes a
>>> > watchdog kicks in and kills the respective VR due to no response.
>>> > 
>>> > So far I didn't find any timeout value in the global settings.
>>> > Temporarily setting network.router.EnableServiceMonitoring to false
>>> > doesn't change the behaviour.
>>> > 
>>> > Any help, how to mitigate that nasty timeout would be really
>>> > appreciated :)
>>> > 
>>> > cheers,
>>> > 
>>> > Stephan
>>> > 
>>> > From within the VR, the logs show
>>> > 
>>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>> > dhcp-range=interface:eth0,set:interface and replacing with
>>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>> > guestnetwork
>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>> > dhcp-option=tag:interface-eth0,15 and replacing with
>>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>> > dhcp-option=tag:interface-eth0,6 and replacing with
>>> > dhcp-option=tag:interface
>>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>> > dhcp-option=tag:interface-eth0,3, and replacing with
>>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>> > dhcp-option=tag:interface-eth0,1, and replacing with
>>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>> > dnsmasq restart
>>> > 
>>> > ==> /var/log/messages <==
>>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>> > halt
>>> > 
>>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>> > 
>>> > The system is going down for system halt NOW!
>>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>> > 
>>> > ==> /var/log/cloud.log <==
>>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>> > Killed                  /opt/cloud/bin/update_config.py
>>> > vm_dhcp_entry.json
>>> > 
>>> > ==> /var/log/messages <==
>>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>> > 
>>> > ==> /var/log/cloud.log <==
>>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>> > Connection to 169.254.2.192 closed by remote host.
>>> > Connection to 169.254.2.192 closed.
>>> > 
>>> > 
>>> > the management-server.log shows
>>> > 
>>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>> > com.cloud.vm.VmWorkStart for job-5164
>>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>> > -5164
>>> > from job monitoring
>>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>> > while
>>> > executing org.apache.cloudstack.api.command.admin.
>>> > router.StartRouterCmd
>>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>> > unreachable: Host 1: Unable to start instance due to Unable to
>>> > start
>>> > VM[DomainRouter|r-504-VM] due to error in f
>>> > inalizeStart, not retrying
>>> >         at
>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> > ineManagerImpl.java:1121)
>>> >         at
>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> > ineManagerImpl.java:4580)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> > Method)
>>> >         at
>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>> > l.java:57)
>>> >         at
>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>> > essorImpl.java:43)
>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>> >         at
>>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>> > Proxy.java:107)
>>> >         at
>>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>> > neManagerImpl.java:4736)
>>> >         at
>>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>> > 2)
>>> >         at
>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>>> >         at
>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>>> > $1.run(ManagedContextRunnable.java:49)
>>> >         at
>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>> > $1.call(DefaultManagedContext.java:56)
>>> >         at
>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>> > llWithContext(DefaultManagedContext.java:103)
>>> >         at
>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>> > nWithContext(DefaultManagedContext.java:53)
>>> >         at
>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>> > nagedContextRunnable.java:46)
>>> >         at
>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>> > $5.run(AsyncJobManagerImpl.java:494)
>>> >         at java.util.concurrent.Executors
>>> > $RunnableAdapter.call(Executors.java:471)
>>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> >         at
>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>> > r.java:1145)
>>> >         at java.util.concurrent.ThreadPoolExecutor
>>> > $Worker.run(ThreadPoolExecutor.java:615)
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>> > start
>>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>> > retrying
>>> >         at
>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> > ineManagerImpl.java:1085)
>>> >         at
>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> > ineManagerImpl.java:4580)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> > Method)
>>> >         ... 18 more
>>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>> > jobStatus: FAILED, resultCode: 530, result: org.
>>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>> > Host 1: Unable to start instance due to Unable t
>>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>> > not
>>> > retrying"}
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>>

Fwd: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Andrei Mikhailovsky <an...@arhont.com>.
Hello guys,

Tried the user's mailing list without any luck. Perhaps the dev guys know if this issue is being looked at for the next release?

I've just upgraded to 4.6.2 and have similar issues with three virtual routers out of 22 in total. They are all failing exactly the same way as described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

----- Original Message -----
> From: "Stephan Seitz" <s....@secretresearchfacility.com>
> To: "users" <us...@cloudstack.apache.org>
> Sent: Monday, 30 November, 2015 19:53:57
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Does anybody else experiemce problems due to (very) slow deployment of
> VRs?
> 
> 
> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>> Update / FYI:
>> After faking the particular VRu in sql, I tried to restart that
>> network,
>> and it always fails. To me it looks like the update_config.py - which
>> takes almost all cpu ressources - runs way longer any watchdog will
>> accept.
>> 
>> I'm able to mitigate that by very nasty workarounds:
>> a) start the router
>> b) wait until its provisioned
>> c) restart cloudstack-management
>> d)  update vm_instance
>> set state='Running',
>> power_state='PowerOn' where name = 'r-XXX-VM';
>> e) once: update domain_router
>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>> 2015",
>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>> where id=XXX;
>> f) wait until update_config.py finishes (for me thats about 15
>> minutes)
>> 
>> Since I expect the need for VR restarts in the future, this behaviour
>> is
>> somehow unsatisfying. It needs a lot of errorprone intervention.
>> 
>> I'm quite unsure if it's introduced with the update or the particular
>> VR
>> just has simply not been restarted after getting configured with lots
>> of
>> ips and rules.
>> 
>> 
>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>> > Hi List!
>> > 
>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>> > rules
>> > defined. During the upgrade process, and after about 4-5 minutes a
>> > watchdog kicks in and kills the respective VR due to no response.
>> > 
>> > So far I didn't find any timeout value in the global settings.
>> > Temporarily setting network.router.EnableServiceMonitoring to false
>> > doesn't change the behaviour.
>> > 
>> > Any help, how to mitigate that nasty timeout would be really
>> > appreciated :)
>> > 
>> > cheers,
>> > 
>> > Stephan
>> > 
>> > From within the VR, the logs show
>> > 
>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>> > dhcp-range=interface:eth0,set:interface and replacing with
>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>> > guestnetwork
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,15 and replacing with
>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,6 and replacing with
>> > dhcp-option=tag:interface
>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,3, and replacing with
>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,1, and replacing with
>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>> > dnsmasq restart
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>> > halt
>> > 
>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>> > 
>> > The system is going down for system halt NOW!
>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>> > 
>> > ==> /var/log/cloud.log <==
>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>> > Killed                  /opt/cloud/bin/update_config.py
>> > vm_dhcp_entry.json
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > 
>> > ==> /var/log/cloud.log <==
>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > Connection to 169.254.2.192 closed by remote host.
>> > Connection to 169.254.2.192 closed.
>> > 
>> > 
>> > the management-server.log shows
>> > 
>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>> > com.cloud.vm.VmWorkStart for job-5164
>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>> > -5164
>> > from job monitoring
>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>> > while
>> > executing org.apache.cloudstack.api.command.admin.
>> > router.StartRouterCmd
>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>> > unreachable: Host 1: Unable to start instance due to Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in f
>> > inalizeStart, not retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1121)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         at
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>> > l.java:57)
>> >         at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>> > essorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>> >         at
>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>> > Proxy.java:107)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>> > neManagerImpl.java:4736)
>> >         at
>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>> > 2)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>> > $1.run(ManagedContextRunnable.java:49)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>> > $1.call(DefaultManagedContext.java:56)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>> > llWithContext(DefaultManagedContext.java:103)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>> > nWithContext(DefaultManagedContext.java:53)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>> > nagedContextRunnable.java:46)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.run(AsyncJobManagerImpl.java:494)
>> >         at java.util.concurrent.Executors
>> > $RunnableAdapter.call(Executors.java:471)
>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >         at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>> > r.java:1145)
>> >         at java.util.concurrent.ThreadPoolExecutor
>> > $Worker.run(ThreadPoolExecutor.java:615)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>> > retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1085)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         ... 18 more
>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>> > jobStatus: FAILED, resultCode: 530, result: org.
>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>> > Host 1: Unable to start instance due to Unable t
>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>> > not
>> > retrying"}
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>>

Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Daniel Mezentsev <da...@soleks.com>.
  Hi All,

Sorry didn't particpate discussion. Got pretty big issue with the cloud
yesterday. Very intersting one.
I have 2 networks with 2 dedicated VR (not VPC). One of the netowork with
default gateway, the second one just for internal interconnect between VMs.
Both network are IsolatedWithNoSNAT service, basically only DHCP. Both
network started. I started fire-up VMs and got default gateway on the
second NIC (it's configured to be on the first). Doing some troubleshooting
on it i found that default gateway was overwitten when second interface was
activating. From my understanding VR of the network not designated to be
"default" should not provide DHCP option "router", but it did.
More over it provided wrong default gateway. In my configuration default
gateway points to device in the reserved IP range (external router), but VR
is pushed own IP address as default gw, so network was completely down.
Here is some numbers

network 1 - 10.1.12.0/24 range 10.1.12.1-240, default gw 10.1.12.254
network 2 - 10.1.22.0/24 range 10.1.22.1-240, default gw 10.1.22.254 (i
have to specify it here, but i really don't need it, more over that ip is
down)

NIC1 on all VMs in network1, NIC1 is default
NIC2 on all VMs in network2

But i got default gw from network2, and it was 10.1.22.1, which is not in
configuration at all.

That is very confusing. I wasn't able to fix that issue, so i removed
network2 completely.

Configuration above was working with 4.5 and 4.4, so it points me that
there is definetely something wrong with 4.6 system VM image.

That solution to increase time-out rather work-arround then real fix for
the problem. There is no way that router is booting for 4-5 minutes.
 

> The routers from Daniel work when we bumped global setting
> router.aggregation.command.each.timeout to 20 or higher. You may need to
> upgrade to 4.7.1 for that to work, but that's a good thing to do anyway
> because it has a load of fixes over 4.6. The upgrade is quick and
> painless.
>
> For the record, it doesn't have to do with the template. Be happy we
> reuse it, makes upgrading way easier.
>
> Regards, Remi
>
> Sent from my iPhone
>
>> On 28 Jan 2016, at 07:12, Daniel Mezentsev <da...@soleks.com> wrote:
>>
>> 4.7.1 also affected.
>> All versions use 4.6 templates.
>>
>> P.S.: would it be usefull to move discussion to devs list. Looks like
>> it's
>> pretty big issue.
>>
>>> Hi!
>>>
>>> After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use
both
>>> KVM and XenServer. Any stable workaround for this?
>>>
>>> Thanx
>>>
>>> Martin
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Andrei Mikhailovsky [mailto:andrei@arhont.com]
>>> Gesendet: Freitag, 18. Dezember 2015 02:55
>>> An: users <us...@cloudstack.apache.org>
>>> Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>
>>> Hello guys,
>>>
>>> I've just upgraded to 4.6.2 and have similar issues with three virtual
>>> routers out of 22 in total. They are all failing exactly the same way
as
>>> described here.
>>>
>>> Has anyone found a permanent workaround for this issue?
>>>
>>> Thanks
>>>
>>> Andrei
>>>
>>> ----- Original Message -----
>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>> To: "users" <us...@cloudstack.apache.org>
>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>
>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>> VRs?
>>>>
>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>> Update / FYI:
>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>> network, and it always fails. To me it looks like the
>>>>> update_config.py - which takes almost all cpu ressources - runs way
>>>>> longer any watchdog will accept.
>>>>>
>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>> a) start the router
>>>>> b) wait until its provisioned
>>>>> c) restart cloudstack-management
>>>>> d)  update vm_instance
>>>>> set state='Running',
>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>> e) once: update domain_router
>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>> where id=XXX;
>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>> minutes)
>>>>>
>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>
>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>> VR just has simply not been restarted after getting configured with
>>>>> lots of ips and rules.
>>>>>
>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>> Hi List!
>>>>>
>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>> rules
>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>
>>>>> So far I didn't find any timeout value in the global settings.
>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>> doesn't change the behaviour.
>>>>>
>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>> appreciated :)
>>>>>
>>>>> cheers,
>>>>>
>>>>> Stephan
>>>>>
>>>>> From within the VR, the logs show
>>>>>
>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>> guestnetwork
>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>> dhcp-option=tag:interface
>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>> dnsmasq restart
>>>>>
>>>>> ==> /var/log/messages <==
>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>> halt
>>>>>
>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>
>>>>> The system is going down for system halt NOW!
>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>
>>>>> ==> /var/log/cloud.log <==
>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>> vm_dhcp_entry.json
>>>>>
>>>>> ==> /var/log/messages <==
>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>
>>>>> ==> /var/log/cloud.log <==
>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>> Connection to 169.254.2.192 closed.
>>>>>
>>>>> the management-server.log shows
>>>>>
>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>> -5164
>>>>> from job monitoring
>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>> while
>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>> router.StartRouterCmd
>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>> start
>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>> inalizeStart, not retrying
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:1121)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:4580)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>>          at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>> l.java:57)
>>>>>          at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>> essorImpl.java:43)
>>>>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>          at
>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>> Proxy.java:107)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>> neManagerImpl.java:4736)
>>>>>          at
>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>> 2)
>>>>>          at
>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>> nagedContextRunnable.java:46)
>>>>>          at
>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>          at java.util.concurrent.Executors
>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>          at
>>
>> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>> r.java:1145)
>>>>>          at java.util.concurrent.ThreadPoolExecutor
>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>> start
>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>> retrying
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:1085)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:4580)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>>          ... 18 more
>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>> Host 1: Unable to start instance due to Unable t
>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>> not
>>>>> retrying"}
>>
>> Daniel Mezentsev, founder
>> (+1) 604 313 8592.
>> Soleks Data Group.Shaping the clouds.
>
>  
  Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.

AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Martin Emrich <ma...@empolis.com>.
Hi!

To follow up:

- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from CloudStack nor from Shapeblue)
- The problem still persists
- It seems that VRs can be created on XenServer, but not on KVM. I tried forcing new VRs to XenServers only via host tags, but the decision to use KVM is being made before the tags are evaluated, so this leaves no hosts when the decision for KVM is made.
- I see this in the KVM host agent.log:
2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] (Script-2:null) (logid:) Interrupting script.
2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 169.254.3.252 -c /var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg .  Output is:
- router.aggregation.command.each.timeout is set to whopping 600s, but the error appears ca. 1-2 minutes after I click on "restart network" with cleanup=true.


Any Ideas?

Thanks

Martin


Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Daniel Mezentsev <da...@soleks.com>.
  Hi All,

Sorry didn't particpate discussion. Got pretty big issue with the cloud
yesterday. Very intersting one.
I have 2 networks with 2 dedicated VR (not VPC). One of the netowork with
default gateway, the second one just for internal interconnect between VMs.
Both network are IsolatedWithNoSNAT service, basically only DHCP. Both
network started. I started fire-up VMs and got default gateway on the
second NIC (it's configured to be on the first). Doing some troubleshooting
on it i found that default gateway was overwitten when second interface was
activating. From my understanding VR of the network not designated to be
"default" should not provide DHCP option "router", but it did.
More over it provided wrong default gateway. In my configuration default
gateway points to device in the reserved IP range (external router), but VR
is pushed own IP address as default gw, so network was completely down.
Here is some numbers

network 1 - 10.1.12.0/24 range 10.1.12.1-240, default gw 10.1.12.254
network 2 - 10.1.22.0/24 range 10.1.22.1-240, default gw 10.1.22.254 (i
have to specify it here, but i really don't need it, more over that ip is
down)

NIC1 on all VMs in network1, NIC1 is default
NIC2 on all VMs in network2

But i got default gw from network2, and it was 10.1.22.1, which is not in
configuration at all.

That is very confusing. I wasn't able to fix that issue, so i removed
network2 completely.

Configuration above was working with 4.5 and 4.4, so it points me that
there is definetely something wrong with 4.6 system VM image.

That solution to increase time-out rather work-arround then real fix for
the problem. There is no way that router is booting for 4-5 minutes.
 

> The routers from Daniel work when we bumped global setting
> router.aggregation.command.each.timeout to 20 or higher. You may need to
> upgrade to 4.7.1 for that to work, but that's a good thing to do anyway
> because it has a load of fixes over 4.6. The upgrade is quick and
> painless.
>
> For the record, it doesn't have to do with the template. Be happy we
> reuse it, makes upgrading way easier.
>
> Regards, Remi
>
> Sent from my iPhone
>
>> On 28 Jan 2016, at 07:12, Daniel Mezentsev <da...@soleks.com> wrote:
>>
>> 4.7.1 also affected.
>> All versions use 4.6 templates.
>>
>> P.S.: would it be usefull to move discussion to devs list. Looks like
>> it's
>> pretty big issue.
>>
>>> Hi!
>>>
>>> After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use
both
>>> KVM and XenServer. Any stable workaround for this?
>>>
>>> Thanx
>>>
>>> Martin
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Andrei Mikhailovsky [mailto:andrei@arhont.com]
>>> Gesendet: Freitag, 18. Dezember 2015 02:55
>>> An: users <us...@cloudstack.apache.org>
>>> Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>
>>> Hello guys,
>>>
>>> I've just upgraded to 4.6.2 and have similar issues with three virtual
>>> routers out of 22 in total. They are all failing exactly the same way
as
>>> described here.
>>>
>>> Has anyone found a permanent workaround for this issue?
>>>
>>> Thanks
>>>
>>> Andrei
>>>
>>> ----- Original Message -----
>>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>>> To: "users" <us...@cloudstack.apache.org>
>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>
>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>> VRs?
>>>>
>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>> Update / FYI:
>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>> network, and it always fails. To me it looks like the
>>>>> update_config.py - which takes almost all cpu ressources - runs way
>>>>> longer any watchdog will accept.
>>>>>
>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>> a) start the router
>>>>> b) wait until its provisioned
>>>>> c) restart cloudstack-management
>>>>> d)  update vm_instance
>>>>> set state='Running',
>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>> e) once: update domain_router
>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>> where id=XXX;
>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>> minutes)
>>>>>
>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>
>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>> VR just has simply not been restarted after getting configured with
>>>>> lots of ips and rules.
>>>>>
>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>> Hi List!
>>>>>
>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>> rules
>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>
>>>>> So far I didn't find any timeout value in the global settings.
>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>> doesn't change the behaviour.
>>>>>
>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>> appreciated :)
>>>>>
>>>>> cheers,
>>>>>
>>>>> Stephan
>>>>>
>>>>> From within the VR, the logs show
>>>>>
>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>> guestnetwork
>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>> dhcp-option=tag:interface
>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>> dnsmasq restart
>>>>>
>>>>> ==> /var/log/messages <==
>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>> halt
>>>>>
>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>
>>>>> The system is going down for system halt NOW!
>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>
>>>>> ==> /var/log/cloud.log <==
>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>> vm_dhcp_entry.json
>>>>>
>>>>> ==> /var/log/messages <==
>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>
>>>>> ==> /var/log/cloud.log <==
>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>> Connection to 169.254.2.192 closed.
>>>>>
>>>>> the management-server.log shows
>>>>>
>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>> -5164
>>>>> from job monitoring
>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>> while
>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>> router.StartRouterCmd
>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>> start
>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>> inalizeStart, not retrying
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:1121)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:4580)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>>          at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>> l.java:57)
>>>>>          at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>> essorImpl.java:43)
>>>>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>          at
>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>> Proxy.java:107)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>> neManagerImpl.java:4736)
>>>>>          at
>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>> 2)
>>>>>          at
>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>> nagedContextRunnable.java:46)
>>>>>          at
>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>          at java.util.concurrent.Executors
>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>          at
>>
>> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>> r.java:1145)
>>>>>          at java.util.concurrent.ThreadPoolExecutor
>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>> start
>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>> retrying
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:1085)
>>>>>          at
>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>> ineManagerImpl.java:4580)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>>          ... 18 more
>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>> Host 1: Unable to start instance due to Unable t
>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>> not
>>>>> retrying"}
>>
>> Daniel Mezentsev, founder
>> (+1) 604 313 8592.
>> Soleks Data Group.Shaping the clouds.
>
>  
  Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.

Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Remi Bergsma <RB...@schubergphilis.com>.
The routers from Daniel work when we bumped global setting router.aggregation.command.each.timeout to 20 or higher. You may need to upgrade to 4.7.1 for that to work, but that's a good thing to do anyway because it has a load of fixes over 4.6. The upgrade is quick and painless. 

For the record, it doesn't have to do with the template. Be happy we reuse it, makes upgrading way easier. 

Regards, Remi 

Sent from my iPhone

> On 28 Jan 2016, at 07:12, Daniel Mezentsev <da...@soleks.com> wrote:
> 
> 4.7.1 also affected.
> All versions use 4.6 templates.
> 
> P.S.: would it be usefull to move discussion to devs list. Looks like it's
> pretty big issue.
> 
>> Hi!
>> 
>> After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both
>> KVM and XenServer. Any stable workaround for this?
>> 
>> Thanx
>> 
>> Martin
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andrei Mikhailovsky [mailto:andrei@arhont.com]
>> Gesendet: Freitag, 18. Dezember 2015 02:55
>> An: users <us...@cloudstack.apache.org>
>> Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>> 
>> Hello guys,
>> 
>> I've just upgraded to 4.6.2 and have similar issues with three virtual
>> routers out of 22 in total. They are all failing exactly the same way as
>> described here.
>> 
>> Has anyone found a permanent workaround for this issue?
>> 
>> Thanks
>> 
>> Andrei
>> 
>> ----- Original Message -----
>>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>>> To: "users" <us...@cloudstack.apache.org>
>>> Sent: Monday, 30 November, 2015 19:53:57
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>> 
>>> Does anybody else experiemce problems due to (very) slow deployment of
>>> VRs?
>>> 
>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>> Update / FYI:
>>>> After faking the particular VRu in sql, I tried to restart that
>>>> network, and it always fails. To me it looks like the
>>>> update_config.py - which takes almost all cpu ressources - runs way
>>>> longer any watchdog will accept.
>>>> 
>>>> I'm able to mitigate that by very nasty workarounds:
>>>> a) start the router
>>>> b) wait until its provisioned
>>>> c) restart cloudstack-management
>>>> d)  update vm_instance
>>>> set state='Running',
>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>> e) once: update domain_router
>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>> where id=XXX;
>>>> f) wait until update_config.py finishes (for me thats about 15
>>>> minutes)
>>>> 
>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>>>> 
>>>> I'm quite unsure if it's introduced with the update or the particular
>>>> VR just has simply not been restarted after getting configured with
>>>> lots of ips and rules.
>>>> 
>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>> Hi List!
>>>> 
>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>> rules
>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>> watchdog kicks in and kills the respective VR due to no response.
>>>> 
>>>> So far I didn't find any timeout value in the global settings.
>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>> doesn't change the behaviour.
>>>> 
>>>> Any help, how to mitigate that nasty timeout would be really
>>>> appreciated :)
>>>> 
>>>> cheers,
>>>> 
>>>> Stephan
>>>> 
>>>> From within the VR, the logs show
>>>> 
>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>> guestnetwork
>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>> dhcp-option=tag:interface
>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>> dnsmasq restart
>>>> 
>>>> ==> /var/log/messages <==
>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>> halt
>>>> 
>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>> 
>>>> The system is going down for system halt NOW!
>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>> 
>>>> ==> /var/log/cloud.log <==
>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>> Killed                  /opt/cloud/bin/update_config.py
>>>> vm_dhcp_entry.json
>>>> 
>>>> ==> /var/log/messages <==
>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> 
>>>> ==> /var/log/cloud.log <==
>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> Connection to 169.254.2.192 closed by remote host.
>>>> Connection to 169.254.2.192 closed.
>>>> 
>>>> the management-server.log shows
>>>> 
>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>> com.cloud.vm.VmWorkStart for job-5164
>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>> -5164
>>>> from job monitoring
>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>> while
>>>> executing org.apache.cloudstack.api.command.admin.
>>>> router.StartRouterCmd
>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>> start
>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>> inalizeStart, not retrying
>>>>          at
>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> ineManagerImpl.java:1121)
>>>>          at
>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> ineManagerImpl.java:4580)
>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>>          at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>> l.java:57)
>>>>          at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>> essorImpl.java:43)
>>>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>>>          at
>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>> Proxy.java:107)
>>>>          at
>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>> neManagerImpl.java:4736)
>>>>          at
>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>> 2)
>>>>          at
>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>          at
>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>> $1.run(ManagedContextRunnable.java:49)
>>>>          at
>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>> $1.call(DefaultManagedContext.java:56)
>>>>          at
>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>> llWithContext(DefaultManagedContext.java:103)
>>>>          at
>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>> nWithContext(DefaultManagedContext.java:53)
>>>>          at
>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>> nagedContextRunnable.java:46)
>>>>          at
>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>          at java.util.concurrent.Executors
>>>> $RunnableAdapter.call(Executors.java:471)
>>>>          at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>> r.java:1145)
>>>>          at java.util.concurrent.ThreadPoolExecutor
>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>          at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>> start
>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>> retrying
>>>>          at
>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> ineManagerImpl.java:1085)
>>>>          at
>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> ineManagerImpl.java:4580)
>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>>          ... 18 more
>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>> Host 1: Unable to start instance due to Unable t
>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>> not
>>>> retrying"}
> Daniel Mezentsev, founder
> (+1) 604 313 8592.
> Soleks Data Group.
> Shaping the clouds.

Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Daniel Mezentsev <da...@soleks.com>.
4.7.1 also affected.
All versions use 4.6 templates.

P.S.: would it be usefull to move discussion to devs list. Looks like it's
pretty big issue.

> Hi!
>
> After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both
> KVM and XenServer. Any stable workaround for this?
>
> Thanx
>
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: Andrei Mikhailovsky [mailto:andrei@arhont.com]
> Gesendet: Freitag, 18. Dezember 2015 02:55
> An: users <us...@cloudstack.apache.org>
> Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
> Hello guys,
>
> I've just upgraded to 4.6.2 and have similar issues with three virtual
> routers out of 22 in total. They are all failing exactly the same way as
> described here.
>
> Has anyone found a permanent workaround for this issue?
>
> Thanks
>
> Andrei
>
> ----- Original Message -----
>> From: "Stephan Seitz" <s....@secretresearchfacility.com>
>> To: "users" <us...@cloudstack.apache.org>
>> Sent: Monday, 30 November, 2015 19:53:57
>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
>> Does anybody else experiemce problems due to (very) slow deployment of
>> VRs?
>>
>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>> Update / FYI:
>>> After faking the particular VRu in sql, I tried to restart that
>>> network, and it always fails. To me it looks like the
>>> update_config.py - which takes almost all cpu ressources - runs way
>>> longer any watchdog will accept.
>>>
>>> I'm able to mitigate that by very nasty workarounds:
>>> a) start the router
>>> b) wait until its provisioned
>>> c) restart cloudstack-management
>>> d)  update vm_instance
>>> set state='Running',
>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>> e) once: update domain_router
>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>> where id=XXX;
>>> f) wait until update_config.py finishes (for me thats about 15
>>> minutes)
>>>
>>> Since I expect the need for VR restarts in the future, this behaviour
>>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>>>
>>> I'm quite unsure if it's introduced with the update or the particular
>>> VR just has simply not been restarted after getting configured with
>>> lots of ips and rules.
>>>
>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>> Hi List!
>>>
>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>> rules
>>> defined. During the upgrade process, and after about 4-5 minutes a
>>> watchdog kicks in and kills the respective VR due to no response.
>>>
>>> So far I didn't find any timeout value in the global settings.
>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>> doesn't change the behaviour.
>>>
>>> Any help, how to mitigate that nasty timeout would be really
>>> appreciated :)
>>>
>>> cheers,
>>>
>>> Stephan
>>>
>>> From within the VR, the logs show
>>>
>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>> dhcp-range=interface:eth0,set:interface and replacing with
>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>> guestnetwork
>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>> dhcp-option=tag:interface
>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>> dnsmasq restart
>>>
>>> ==> /var/log/messages <==
>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>> halt
>>>
>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>
>>> The system is going down for system halt NOW!
>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>
>>> ==> /var/log/cloud.log <==
>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>> Killed                  /opt/cloud/bin/update_config.py
>>> vm_dhcp_entry.json
>>>
>>> ==> /var/log/messages <==
>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>
>>> ==> /var/log/cloud.log <==
>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>> Connection to 169.254.2.192 closed by remote host.
>>> Connection to 169.254.2.192 closed.
>>>
>>> the management-server.log shows
>>>
>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>> com.cloud.vm.VmWorkStart for job-5164
>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>> -5164
>>> from job monitoring
>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>> while
>>> executing org.apache.cloudstack.api.command.admin.
>>> router.StartRouterCmd
>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>> unreachable: Host 1: Unable to start instance due to Unable to
>>> start
>>> VM[DomainRouter|r-504-VM] due to error in f
>>> inalizeStart, not retrying
>>>          at
>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> ineManagerImpl.java:1121)
>>>          at
>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> ineManagerImpl.java:4580)
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>>          at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>> l.java:57)
>>>          at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>> essorImpl.java:43)
>>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>>          at
>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>> Proxy.java:107)
>>>          at
>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>> neManagerImpl.java:4736)
>>>          at
>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>> 2)
>>>          at
>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>          at
>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>> $1.run(ManagedContextRunnable.java:49)
>>>          at
>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>> $1.call(DefaultManagedContext.java:56)
>>>          at
>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>> llWithContext(DefaultManagedContext.java:103)
>>>          at
>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>> nWithContext(DefaultManagedContext.java:53)
>>>          at
>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>> nagedContextRunnable.java:46)
>>>          at
>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>> $5.run(AsyncJobManagerImpl.java:494)
>>>          at java.util.concurrent.Executors
>>> $RunnableAdapter.call(Executors.java:471)
>>>          at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>> r.java:1145)
>>>          at java.util.concurrent.ThreadPoolExecutor
>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>          at java.lang.Thread.run(Thread.java:745)
>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>> start
>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>> retrying
>>>          at
>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> ineManagerImpl.java:1085)
>>>          at
>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>> ineManagerImpl.java:4580)
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>>          ... 18 more
>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>> jobStatus: FAILED, resultCode: 530, result: org.
>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>> Host 1: Unable to start instance due to Unable t
>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>> not
>>> retrying"}
>>>
>>>  
>
>  
  Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.

AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Posted by Martin Emrich <ma...@empolis.com>.
Hi!

After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both KVM and XenServer. Any stable workaround for this?

Thanx

Martin



-----Ursprüngliche Nachricht-----
Von: Andrei Mikhailovsky [mailto:andrei@arhont.com] 
Gesendet: Freitag, 18. Dezember 2015 02:55
An: users <us...@cloudstack.apache.org>
Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual routers out of 22 in total. They are all failing exactly the same way as described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

----- Original Message -----
> From: "Stephan Seitz" <s....@secretresearchfacility.com>
> To: "users" <us...@cloudstack.apache.org>
> Sent: Monday, 30 November, 2015 19:53:57
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Does anybody else experiemce problems due to (very) slow deployment of 
> VRs?
> 
> 
> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>> Update / FYI:
>> After faking the particular VRu in sql, I tried to restart that 
>> network, and it always fails. To me it looks like the 
>> update_config.py - which takes almost all cpu ressources - runs way 
>> longer any watchdog will accept.
>> 
>> I'm able to mitigate that by very nasty workarounds:
>> a) start the router
>> b) wait until its provisioned
>> c) restart cloudstack-management
>> d)  update vm_instance
>> set state='Running',
>> power_state='PowerOn' where name = 'r-XXX-VM';
>> e) once: update domain_router
>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC 
>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>> where id=XXX;
>> f) wait until update_config.py finishes (for me thats about 15
>> minutes)
>> 
>> Since I expect the need for VR restarts in the future, this behaviour 
>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>> 
>> I'm quite unsure if it's introduced with the update or the particular 
>> VR just has simply not been restarted after getting configured with 
>> lots of ips and rules.
>> 
>> 
>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>> > Hi List!
>> > 
>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>> > rules
>> > defined. During the upgrade process, and after about 4-5 minutes a
>> > watchdog kicks in and kills the respective VR due to no response.
>> > 
>> > So far I didn't find any timeout value in the global settings.
>> > Temporarily setting network.router.EnableServiceMonitoring to false
>> > doesn't change the behaviour.
>> > 
>> > Any help, how to mitigate that nasty timeout would be really
>> > appreciated :)
>> > 
>> > cheers,
>> > 
>> > Stephan
>> > 
>> > From within the VR, the logs show
>> > 
>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>> > dhcp-range=interface:eth0,set:interface and replacing with
>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>> > guestnetwork
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,15 and replacing with
>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,6 and replacing with
>> > dhcp-option=tag:interface
>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,3, and replacing with
>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,1, and replacing with
>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>> > dnsmasq restart
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>> > halt
>> > 
>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>> > 
>> > The system is going down for system halt NOW!
>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>> > 
>> > ==> /var/log/cloud.log <==
>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>> > Killed                  /opt/cloud/bin/update_config.py
>> > vm_dhcp_entry.json
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > 
>> > ==> /var/log/cloud.log <==
>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > Connection to 169.254.2.192 closed by remote host.
>> > Connection to 169.254.2.192 closed.
>> > 
>> > 
>> > the management-server.log shows
>> > 
>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>> > com.cloud.vm.VmWorkStart for job-5164
>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>> > -5164
>> > from job monitoring
>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>> > while
>> > executing org.apache.cloudstack.api.command.admin.
>> > router.StartRouterCmd
>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>> > unreachable: Host 1: Unable to start instance due to Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in f
>> > inalizeStart, not retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1121)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         at
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>> > l.java:57)
>> >         at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>> > essorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>> >         at
>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>> > Proxy.java:107)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>> > neManagerImpl.java:4736)
>> >         at
>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>> > 2)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>> > $1.run(ManagedContextRunnable.java:49)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>> > $1.call(DefaultManagedContext.java:56)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>> > llWithContext(DefaultManagedContext.java:103)
>> >         at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>> > nWithContext(DefaultManagedContext.java:53)
>> >         at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>> > nagedContextRunnable.java:46)
>> >         at
>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>> > $5.run(AsyncJobManagerImpl.java:494)
>> >         at java.util.concurrent.Executors
>> > $RunnableAdapter.call(Executors.java:471)
>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >         at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>> > r.java:1145)
>> >         at java.util.concurrent.ThreadPoolExecutor
>> > $Worker.run(ThreadPoolExecutor.java:615)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>> > start
>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>> > retrying
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:1085)
>> >         at
>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>> > ineManagerImpl.java:4580)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> >         ... 18 more
>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>> > jobStatus: FAILED, resultCode: 530, result: org.
>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>> > Host 1: Unable to start instance due to Unable t
>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>> > not
>> > retrying"}
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>>