You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Arnaud Gaillard <ar...@xtendsys.net> on 2012/11/12 20:07:32 UTC

HA getting crazy after management server restart (3.0.2)

Hello,

We rebooted the management server to check if it had an impact on a little
display bug we spotted. Since that moment all our nfra is getting crazy.

After the reboot all our node (17) went to the
disconnected/alert/connecting state and the HA-Worker is complaining that
the various hosts are unreachable. (please note that no other change were
made and the network is fine and no Iptables/FW are preventing the
communication)

For instance:
Unable to reach the agent for VM[ConsoleProxy|v-189-VM]: Resource [Host:78]
is unreachable: Host 78: Host is not in the right state: Disconnected

and

2012-11-12 15:22:47,849 INFO  [agent.manager.AgentMonitor]
(AgentMonitor:null) Found the following agents behind on ping: [75, 52, 4]
2012-11-12 15:22:47,851 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
timeout for host 75, do invstigation
2012-11-12 15:22:47,853 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
timeout for host 52, do invstigation
2012-11-12 15:22:47,853 INFO  [agent.manager.AgentManagerImpl]
(AgentTaskPool-5:null) Investigating why host 75 has disconnected with
event PingTimeout
2012-11-12 15:22:47,854 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-5:null) checking if agent (75) is alive
2012-11-12 15:22:47,855 INFO  [agent.manager.AgentManagerImpl]
(AgentTaskPool-6:null) Investigating why host 52 has disconnected with
event PingTimeout
2012-11-12 15:22:47,855 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-6:null) checking if agent (52) is alive
2012-11-12 15:22:47,856 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
timeout for host 4, do invstigation

Seems that Ping failed because all are down are seen down....
All the node are running fine and are connected (tcp status connected) to
the server however the management server seems to not see them.

The interface show the status connecting but the node are never going back
to the connected mode. The client is saying:


2012-11-12 15:39:36,202 INFO  [utils.nio.NioClient] (Agent-Selector:null)
Connecting to 172.16.11.10:8250
2012-11-12 15:39:36,295 INFO  [utils.nio.NioClient] (Agent-Selector:null)
SSL: Handshake done
2012-11-12 15:39:41,296 INFO  [cloud.agent.Agent] (Agent Timer:null)
Connected to the server
2012-11-12 15:45:37,635 INFO  [cloud.agent.Agent] (Agent Timer:null) The
startup command is now cancelled
2012-11-12 15:45:42,636 INFO  [cloud.agent.Agent] (Agent Timer:null) Lost
connection to the server. Dealing with the remaining commands...

Were the agent tries to connect to the management server but get
disconnected for an unknow reason.

The only error I see that catch my eye in the log is:

 2012-11-12 15:17:48,945 ERROR [cloud.servlet.CloudStartupServlet]
(main:null) Exception starting management server
 java.lang.NumberFormatException: For input string: "false"
         at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
         at java.lang.Integer.parseInt(Integer.java:481)
         at java.lang.Integer.parseInt(Integer.java:514)
         at com.cloud.api.ApiServer.init(ApiServer.java:282)
         at com.cloud.api.ApiServer.initApiServer(ApiServer.java:159)
         at
com.cloud.servlet.CloudStartupServlet.init(CloudStartupServlet.java:46)
         at javax.servlet.GenericServlet.init(GenericServlet.java:212)
         at
org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
         at
org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
          at
org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4187)
         at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4496)
        at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
          at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
          at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
         at
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
         at
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
         at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
         at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
         at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
         at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
         at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
         at
org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
         at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
         at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
         at
org.apache.catalina.core.StandardService.start(StandardService.java:516)
       at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
         at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
          at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

We are clueless about what is causing this issue, as all the node (from 3
different zones) are seen down but they are in fact running fine...

As these is creating a big mess in our production so any idea may be useful!

Thanks,

Re: HA getting crazy after management server restart (3.0.2)

Posted by Bryan Whitehead <dr...@megahappy.net>.

What are the specific values you upped? I'd like to preemptively take
a look at these myself - would prefer HA not going crazy. :)

On Mon, Nov 12, 2012 at 3:12 PM, Arnaud Gaillard
<ar...@xtendsys.net> wrote:
> Thanks we did increase the timeout value, and everything seem to be back in
> order.
>
>
>
>
>
> On Mon, Nov 12, 2012 at 8:21 PM, Caleb Call <ca...@me.com> wrote:
>
>> I had the same thing happen to my environment.  I thought it was just my
>> older hardware.  I ended up upping the check values in the global settings
>> and it hasn't reoccured since (it happened three time before making these
>> changes).  One thing that help slightly is adding a hosts entry for my
>> hypervisors to the management server.
>>
>>
>> On Nov 12, 2012, at 12:07 PM, Arnaud Gaillard <
>> arnaud.gaillard@xtendsys.net> wrote:
>>
>> > Hello,
>> >
>> > We rebooted the management server to check if it had an impact on a
>> little
>> > display bug we spotted. Since that moment all our nfra is getting crazy.
>> >
>> > After the reboot all our node (17) went to the
>> > disconnected/alert/connecting state and the HA-Worker is complaining that
>> > the various hosts are unreachable. (please note that no other change were
>> > made and the network is fine and no Iptables/FW are preventing the
>> > communication)
>> >
>> > For instance:
>> > Unable to reach the agent for VM[ConsoleProxy|v-189-VM]: Resource
>> [Host:78]
>> > is unreachable: Host 78: Host is not in the right state: Disconnected
>> >
>> > and
>> >
>> > 2012-11-12 15:22:47,849 INFO  [agent.manager.AgentMonitor]
>> > (AgentMonitor:null) Found the following agents behind on ping: [75, 52,
>> 4]
>> > 2012-11-12 15:22:47,851 DEBUG [cloud.host.Status] (AgentMonitor:null)
>> Ping
>> > timeout for host 75, do invstigation
>> > 2012-11-12 15:22:47,853 DEBUG [cloud.host.Status] (AgentMonitor:null)
>> Ping
>> > timeout for host 52, do invstigation
>> > 2012-11-12 15:22:47,853 INFO  [agent.manager.AgentManagerImpl]
>> > (AgentTaskPool-5:null) Investigating why host 75 has disconnected with
>> > event PingTimeout
>> > 2012-11-12 15:22:47,854 DEBUG [agent.manager.AgentManagerImpl]
>> > (AgentTaskPool-5:null) checking if agent (75) is alive
>> > 2012-11-12 15:22:47,855 INFO  [agent.manager.AgentManagerImpl]
>> > (AgentTaskPool-6:null) Investigating why host 52 has disconnected with
>> > event PingTimeout
>> > 2012-11-12 15:22:47,855 DEBUG [agent.manager.AgentManagerImpl]
>> > (AgentTaskPool-6:null) checking if agent (52) is alive
>> > 2012-11-12 15:22:47,856 DEBUG [cloud.host.Status] (AgentMonitor:null)
>> Ping
>> > timeout for host 4, do invstigation
>> >
>> > Seems that Ping failed because all are down are seen down....
>> > All the node are running fine and are connected (tcp status connected) to
>> > the server however the management server seems to not see them.
>> >
>> > The interface show the status connecting but the node are never going
>> back
>> > to the connected mode. The client is saying:
>> >
>> >
>> > 2012-11-12 15:39:36,202 INFO  [utils.nio.NioClient] (Agent-Selector:null)
>> > Connecting to 172.16.11.10:8250
>> > 2012-11-12 15:39:36,295 INFO  [utils.nio.NioClient] (Agent-Selector:null)
>> > SSL: Handshake done
>> > 2012-11-12 15:39:41,296 INFO  [cloud.agent.Agent] (Agent Timer:null)
>> > Connected to the server
>> > 2012-11-12 15:45:37,635 INFO  [cloud.agent.Agent] (Agent Timer:null) The
>> > startup command is now cancelled
>> > 2012-11-12 15:45:42,636 INFO  [cloud.agent.Agent] (Agent Timer:null) Lost
>> > connection to the server. Dealing with the remaining commands...
>> >
>> > Were the agent tries to connect to the management server but get
>> > disconnected for an unknow reason.
>> >
>> > The only error I see that catch my eye in the log is:
>> >
>> > 2012-11-12 15:17:48,945 ERROR [cloud.servlet.CloudStartupServlet]
>> > (main:null) Exception starting management server
>> > java.lang.NumberFormatException: For input string: "false"
>> >         at
>> >
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> >         at java.lang.Integer.parseInt(Integer.java:481)
>> >         at java.lang.Integer.parseInt(Integer.java:514)
>> >         at com.cloud.api.ApiServer.init(ApiServer.java:282)
>> >         at com.cloud.api.ApiServer.initApiServer(ApiServer.java:159)
>> >         at
>> > com.cloud.servlet.CloudStartupServlet.init(CloudStartupServlet.java:46)
>> >         at javax.servlet.GenericServlet.init(GenericServlet.java:212)
>> >         at
>> >
>> org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
>> >         at
>> > org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
>> >          at
>> >
>> org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4187)
>> >         at
>> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4496)
>> >        at
>> >
>> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>> >          at
>> > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>> >          at
>> > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>> >         at
>> >
>> org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
>> >         at
>> >
>> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
>> >         at
>> > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>> >         at
>> > org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
>> >         at
>> >
>> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
>> >         at
>> >
>> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>> >         at
>> > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
>> >         at
>> > org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
>> >         at
>> > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>> >         at
>> > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>> >         at
>> > org.apache.catalina.core.StandardService.start(StandardService.java:516)
>> >       at
>> > org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>> >         at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >        at java.lang.reflect.Method.invoke(Method.java:616)
>> >         at
>> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>> >          at
>> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
>> >
>> > We are clueless about what is causing this issue, as all the node (from 3
>> > different zones) are seen down but they are in fact running fine...
>> >
>> > As these is creating a big mess in our production so any idea may be
>> useful!
>> >
>> > Thanks,
>>
>>

Re: HA getting crazy after management server restart (3.0.2)

Posted by Arnaud Gaillard <ar...@xtendsys.net>.

Thanks we did increase the timeout value, and everything seem to be back in
order.





On Mon, Nov 12, 2012 at 8:21 PM, Caleb Call <ca...@me.com> wrote:

> I had the same thing happen to my environment.  I thought it was just my
> older hardware.  I ended up upping the check values in the global settings
> and it hasn't reoccured since (it happened three time before making these
> changes).  One thing that help slightly is adding a hosts entry for my
> hypervisors to the management server.
>
>
> On Nov 12, 2012, at 12:07 PM, Arnaud Gaillard <
> arnaud.gaillard@xtendsys.net> wrote:
>
> > Hello,
> >
> > We rebooted the management server to check if it had an impact on a
> little
> > display bug we spotted. Since that moment all our nfra is getting crazy.
> >
> > After the reboot all our node (17) went to the
> > disconnected/alert/connecting state and the HA-Worker is complaining that
> > the various hosts are unreachable. (please note that no other change were
> > made and the network is fine and no Iptables/FW are preventing the
> > communication)
> >
> > For instance:
> > Unable to reach the agent for VM[ConsoleProxy|v-189-VM]: Resource
> [Host:78]
> > is unreachable: Host 78: Host is not in the right state: Disconnected
> >
> > and
> >
> > 2012-11-12 15:22:47,849 INFO  [agent.manager.AgentMonitor]
> > (AgentMonitor:null) Found the following agents behind on ping: [75, 52,
> 4]
> > 2012-11-12 15:22:47,851 DEBUG [cloud.host.Status] (AgentMonitor:null)
> Ping
> > timeout for host 75, do invstigation
> > 2012-11-12 15:22:47,853 DEBUG [cloud.host.Status] (AgentMonitor:null)
> Ping
> > timeout for host 52, do invstigation
> > 2012-11-12 15:22:47,853 INFO  [agent.manager.AgentManagerImpl]
> > (AgentTaskPool-5:null) Investigating why host 75 has disconnected with
> > event PingTimeout
> > 2012-11-12 15:22:47,854 DEBUG [agent.manager.AgentManagerImpl]
> > (AgentTaskPool-5:null) checking if agent (75) is alive
> > 2012-11-12 15:22:47,855 INFO  [agent.manager.AgentManagerImpl]
> > (AgentTaskPool-6:null) Investigating why host 52 has disconnected with
> > event PingTimeout
> > 2012-11-12 15:22:47,855 DEBUG [agent.manager.AgentManagerImpl]
> > (AgentTaskPool-6:null) checking if agent (52) is alive
> > 2012-11-12 15:22:47,856 DEBUG [cloud.host.Status] (AgentMonitor:null)
> Ping
> > timeout for host 4, do invstigation
> >
> > Seems that Ping failed because all are down are seen down....
> > All the node are running fine and are connected (tcp status connected) to
> > the server however the management server seems to not see them.
> >
> > The interface show the status connecting but the node are never going
> back
> > to the connected mode. The client is saying:
> >
> >
> > 2012-11-12 15:39:36,202 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > Connecting to 172.16.11.10:8250
> > 2012-11-12 15:39:36,295 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > SSL: Handshake done
> > 2012-11-12 15:39:41,296 INFO  [cloud.agent.Agent] (Agent Timer:null)
> > Connected to the server
> > 2012-11-12 15:45:37,635 INFO  [cloud.agent.Agent] (Agent Timer:null) The
> > startup command is now cancelled
> > 2012-11-12 15:45:42,636 INFO  [cloud.agent.Agent] (Agent Timer:null) Lost
> > connection to the server. Dealing with the remaining commands...
> >
> > Were the agent tries to connect to the management server but get
> > disconnected for an unknow reason.
> >
> > The only error I see that catch my eye in the log is:
> >
> > 2012-11-12 15:17:48,945 ERROR [cloud.servlet.CloudStartupServlet]
> > (main:null) Exception starting management server
> > java.lang.NumberFormatException: For input string: "false"
> >         at
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> >         at java.lang.Integer.parseInt(Integer.java:481)
> >         at java.lang.Integer.parseInt(Integer.java:514)
> >         at com.cloud.api.ApiServer.init(ApiServer.java:282)
> >         at com.cloud.api.ApiServer.initApiServer(ApiServer.java:159)
> >         at
> > com.cloud.servlet.CloudStartupServlet.init(CloudStartupServlet.java:46)
> >         at javax.servlet.GenericServlet.init(GenericServlet.java:212)
> >         at
> >
> org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
> >         at
> > org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
> >          at
> >
> org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4187)
> >         at
> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4496)
> >        at
> >
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> >          at
> > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
> >          at
> > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
> >         at
> >
> org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
> >         at
> >
> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
> >         at
> > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
> >         at
> > org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
> >         at
> >
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
> >         at
> >
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
> >         at
> > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
> >         at
> > org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
> >         at
> > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> >         at
> > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
> >         at
> > org.apache.catalina.core.StandardService.start(StandardService.java:516)
> >       at
> > org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
> >         at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >        at java.lang.reflect.Method.invoke(Method.java:616)
> >         at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
> >          at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
> >
> > We are clueless about what is causing this issue, as all the node (from 3
> > different zones) are seen down but they are in fact running fine...
> >
> > As these is creating a big mess in our production so any idea may be
> useful!
> >
> > Thanks,
>
>

Re: HA getting crazy after management server restart (3.0.2)

Posted by Caleb Call <ca...@me.com>.

I had the same thing happen to my environment.  I thought it was just my older hardware.  I ended up upping the check values in the global settings and it hasn't reoccured since (it happened three time before making these changes).  One thing that help slightly is adding a hosts entry for my hypervisors to the management server.


On Nov 12, 2012, at 12:07 PM, Arnaud Gaillard <ar...@xtendsys.net> wrote:

> Hello,
> 
> We rebooted the management server to check if it had an impact on a little
> display bug we spotted. Since that moment all our nfra is getting crazy.
> 
> After the reboot all our node (17) went to the
> disconnected/alert/connecting state and the HA-Worker is complaining that
> the various hosts are unreachable. (please note that no other change were
> made and the network is fine and no Iptables/FW are preventing the
> communication)
> 
> For instance:
> Unable to reach the agent for VM[ConsoleProxy|v-189-VM]: Resource [Host:78]
> is unreachable: Host 78: Host is not in the right state: Disconnected
> 
> and
> 
> 2012-11-12 15:22:47,849 INFO  [agent.manager.AgentMonitor]
> (AgentMonitor:null) Found the following agents behind on ping: [75, 52, 4]
> 2012-11-12 15:22:47,851 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
> timeout for host 75, do invstigation
> 2012-11-12 15:22:47,853 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
> timeout for host 52, do invstigation
> 2012-11-12 15:22:47,853 INFO  [agent.manager.AgentManagerImpl]
> (AgentTaskPool-5:null) Investigating why host 75 has disconnected with
> event PingTimeout
> 2012-11-12 15:22:47,854 DEBUG [agent.manager.AgentManagerImpl]
> (AgentTaskPool-5:null) checking if agent (75) is alive
> 2012-11-12 15:22:47,855 INFO  [agent.manager.AgentManagerImpl]
> (AgentTaskPool-6:null) Investigating why host 52 has disconnected with
> event PingTimeout
> 2012-11-12 15:22:47,855 DEBUG [agent.manager.AgentManagerImpl]
> (AgentTaskPool-6:null) checking if agent (52) is alive
> 2012-11-12 15:22:47,856 DEBUG [cloud.host.Status] (AgentMonitor:null) Ping
> timeout for host 4, do invstigation
> 
> Seems that Ping failed because all are down are seen down....
> All the node are running fine and are connected (tcp status connected) to
> the server however the management server seems to not see them.
> 
> The interface show the status connecting but the node are never going back
> to the connected mode. The client is saying:
> 
> 
> 2012-11-12 15:39:36,202 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> Connecting to 172.16.11.10:8250
> 2012-11-12 15:39:36,295 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> SSL: Handshake done
> 2012-11-12 15:39:41,296 INFO  [cloud.agent.Agent] (Agent Timer:null)
> Connected to the server
> 2012-11-12 15:45:37,635 INFO  [cloud.agent.Agent] (Agent Timer:null) The
> startup command is now cancelled
> 2012-11-12 15:45:42,636 INFO  [cloud.agent.Agent] (Agent Timer:null) Lost
> connection to the server. Dealing with the remaining commands...
> 
> Were the agent tries to connect to the management server but get
> disconnected for an unknow reason.
> 
> The only error I see that catch my eye in the log is:
> 
> 2012-11-12 15:17:48,945 ERROR [cloud.servlet.CloudStartupServlet]
> (main:null) Exception starting management server
> java.lang.NumberFormatException: For input string: "false"
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:481)
>         at java.lang.Integer.parseInt(Integer.java:514)
>         at com.cloud.api.ApiServer.init(ApiServer.java:282)
>         at com.cloud.api.ApiServer.initApiServer(ApiServer.java:159)
>         at
> com.cloud.servlet.CloudStartupServlet.init(CloudStartupServlet.java:46)
>         at javax.servlet.GenericServlet.init(GenericServlet.java:212)
>         at
> org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
>         at
> org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
>          at
> org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4187)
>         at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4496)
>        at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>          at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>          at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>         at
> org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
>         at
> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
>         at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>         at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
>         at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
>         at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>         at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
>         at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
>         at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>         at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>         at
> org.apache.catalina.core.StandardService.start(StandardService.java:516)
>       at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>         at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>          at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
> 
> We are clueless about what is causing this issue, as all the node (from 3
> different zones) are seen down but they are in fact running fine...
> 
> As these is creating a big mess in our production so any idea may be useful!
> 
> Thanks,