You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by dimas yoga pratama <sm...@gmail.com> on 2014/06/19 16:04:10 UTC

[URGENT] cloudstack SSVM and router failed to start after power blackout

OK this is my problem, after blackout I can''t start virtual router, and
ssvm not detected in my cloudstack system. SSVM recreated itself but stuck
in starting state.

What should I do?Please help me..

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by dimas yoga pratama <sm...@gmail.com>.
Thanks Rohit, I will give feedback as soon as I tried it


On Thu, Jun 19, 2014 at 10:38 PM, Rohit Yadav <bh...@apache.org> wrote:

> Hi Dimas,
>
> Looks like the VM is in starting state and CloudStack is unable to contact
> the agent. Hope you've removed the VR from CloudStack using the UI. You can
> try restarting the management server. The issue is of sync, where one party
> (mgmt server) has different view of the world than the other (the
> host/agent). In such cases, do not remove the host else when you re-add it,
> it may destroy all the (user) VMs on it or simply fail.
>
> If restarting won't fix the problem, in global settings reduce the expunge
> timeout (that's when CloudStack marks a VM as removed, since you've just
> destroyed it, it can take some time to get expunged) and try again.
>
> As a final course of action I would stop the management server, then ssh to
> the host and destroy SSVMs, using mysql client I would change db entries
> for SSVM to removed/expunged (simply mark by updating row, do not remove
> the row), start the mgmt server again and hope it would work this time.
>
> Suggestions anyone in such a case?
>
> Regards.
>
>
> On Thu, Jun 19, 2014 at 8:29 PM, dimas yoga pratama <sm...@gmail.com>
> wrote:
>
> > management log :
> >
> > 2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
> > SSL: Fail to find the generated keystore. Loading fail-safe one to
> > continue.
> >
> > 2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
> > (AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
> > r-71-VM
> > 2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> > (AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
> > so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
> > 2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
> > (AgentConnectTaskPool-344:ctx-07431045) Job is executed without a
> context,
> > setup psudo job for the executing thread
> > 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
> > (AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state
> but
> > we're moving on because it's forced stop
> > 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
> > (AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
> > VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
> > 2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
> > (AgentConnectTaskPool-344:ctx-07431045) Monitor
> > ClusteredVirtualMachineManagerImpl says there is an error in the connect
> > process for 2 due to Work item not found, We cannot stop
> > VM[DomainRouter|r-71-VM] when it is in state Starting
> > com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
> > cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
> >         at
> >
> >
> com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
> >         at
> >
> >
> com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
> >         at
> >
> >
> com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
> >         at
> >
> >
> com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
> >         at
> >
> >
> com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
> >         at
> >
> >
> com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
> >         at
> >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >         at
> >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >         at
> >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >         at
> >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >         at
> >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:744)
> > 2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
> > (secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary
> storage
> > VM yet
> > 2014-06-19 21:49:08,282 DEBUG [c.c.c.ConsoleProxyManagerImpl]
> > (consoleproxy-1:ctx-60f4b3c9) Zone 1 is not ready to launch console proxy
> > yet
> > 2014-06-19 21:49:01,197 DEBUG [c.c.n.NetworkUsageManagerImpl]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Disconnected called on 2 with
> > status Alert
> > 2014-06-19 21:49:01,197 DEBUG [c.c.a.m.AgentManagerImpl]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Sending Disconnect to listener:
> > com.cloud.consoleproxy.ConsoleProxyListener
> > 2014-06-19 21:49:01,198 DEBUG [c.c.h.Status]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Transition:[Resource state =
> > Enabled, Agent event = AgentDisconnected, Host id = 2, name =
> > host1.cloud.priv]
> > 2014-06-19 21:49:01,259 DEBUG [c.c.h.Status]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Agent status update: [id = 2;
> name
> > = host1.cloud.priv; old status = Connecting; event = AgentDisconnected;
> new
> > status = Alert; old update count = 404; new update count = 405]
> > 2014-06-19 21:49:01,259 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Notifying other nodes of to
> > disconnect
> > 2014-06-19 21:49:01,260 DEBUG [c.c.a.m.AgentManagerImpl]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Failed to handle host connection:
> > com.cloud.utils.exception.CloudRuntimeException: Unable to connect 2
> > 2014-06-19 21:49:01,261 DEBUG [c.c.a.m.AgentManagerImpl]
> > (AgentConnectTaskPool-342:ctx-b66d3294) Can not send command
> > com.cloud.agent.api.ReadyCommand due to Host 2 is not up
> >
> >
> > Host log:
> >
> > 2014-06-19 21:19:35,890 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> > Lost connection to the server. Dealing with the remaining commands...
> > 2014-06-19 21:19:40,891 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> > Reconnecting...
> > 2014-06-19 21:19:40,891 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > Connecting to 10.151.32.51:8250
> > 2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > SSL: Handshake done
> > 2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > Connected to 10.151.32.51:8250
> > 2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> > Proccess agent startup answer, agent id = 0
> > 2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> > Set agent id 0
> > 2014-06-19 21:19:41,085 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> > Startup Response Received: agent id = 0
> > 2014-06-19 21:19:45,990 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> > Connected to the server
> > 2014-06-19 21:19:46,595 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> > Lost connection to the server. Dealing with the remaining commands...
> > 2014-06-19 21:19:51,596 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> > Reconnecting...
> >
> >
> >
> > I'm using Centos 6.5 and Cloudstack 4.3 with basic networking.
> >
> > Please help me..
> >
> >
> > On Thu, Jun 19, 2014 at 9:42 PM, Rohit Yadav <bh...@apache.org>
> wrote:
> >
> > > I'm not sure what could be the specific issue. You can tail the
> > management
> > > server logs to see what is failing. After you figure out the specific
> > > issue, you may share it with us with your host os, CloudStack version
> > > details and the connected host details.
> > >
> > > Regards.
> > >
> > >
> > > On Thu, Jun 19, 2014 at 8:01 PM, dimas yoga pratama <smidh.d@gmail.com
> >
> > > wrote:
> > >
> > > > Hi, from the infrastructure tab I can detect the hosts, but  both of
> > the
> > > > hosts show alert state., I already try to force reconnect but it
> fails.
> > > > What should I do? Now the CPVM fail to start too.
> > > >
> > > >
> > > > On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav <bh...@apache.org>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > SSVMs and VRs are stateless so if restarts are not working for you,
> > you
> > > > may
> > > > > (force) stop and remove them. The CloudStack HA thread(s) would
> > > kickstart
> > > > > new ones after a certain timeout, to speed this behaviour you may
> > > restart
> > > > > CloudStack as well.
> > > > >
> > > > > If your problem still persists after trying above you may try
> > debugging
> > > > the
> > > > > issue:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
> > > > >
> > > > > Regards.
> > > > >
> > > > >
> > > > > On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <
> > smidh.d@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > OK this is my problem, after blackout I can''t start virtual
> > router,
> > > > and
> > > > > > ssvm not detected in my cloudstack system. SSVM recreated itself
> > but
> > > > > stuck
> > > > > > in starting state.
> > > > > >
> > > > > > What should I do?Please help me..
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by Rohit Yadav <bh...@apache.org>.
Hi Dimas,

Looks like the VM is in starting state and CloudStack is unable to contact
the agent. Hope you've removed the VR from CloudStack using the UI. You can
try restarting the management server. The issue is of sync, where one party
(mgmt server) has different view of the world than the other (the
host/agent). In such cases, do not remove the host else when you re-add it,
it may destroy all the (user) VMs on it or simply fail.

If restarting won't fix the problem, in global settings reduce the expunge
timeout (that's when CloudStack marks a VM as removed, since you've just
destroyed it, it can take some time to get expunged) and try again.

As a final course of action I would stop the management server, then ssh to
the host and destroy SSVMs, using mysql client I would change db entries
for SSVM to removed/expunged (simply mark by updating row, do not remove
the row), start the mgmt server again and hope it would work this time.

Suggestions anyone in such a case?

Regards.


On Thu, Jun 19, 2014 at 8:29 PM, dimas yoga pratama <sm...@gmail.com>
wrote:

> management log :
>
> 2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
> SSL: Fail to find the generated keystore. Loading fail-safe one to
> continue.
>
> 2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
> r-71-VM
> 2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
> so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
> 2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
> (AgentConnectTaskPool-344:ctx-07431045) Job is executed without a context,
> setup psudo job for the executing thread
> 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state but
> we're moving on because it's forced stop
> 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
> VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
> 2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
> (AgentConnectTaskPool-344:ctx-07431045) Monitor
> ClusteredVirtualMachineManagerImpl says there is an error in the connect
> process for 2 due to Work item not found, We cannot stop
> VM[DomainRouter|r-71-VM] when it is in state Starting
> com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
> cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
>         at
>
> com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
>         at
>
> com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
>         at
>
> com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
>         at
>
> com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
>         at
>
> com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
>         at
>
> com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
>         at
>
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
>
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
> (secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary storage
> VM yet
> 2014-06-19 21:49:08,282 DEBUG [c.c.c.ConsoleProxyManagerImpl]
> (consoleproxy-1:ctx-60f4b3c9) Zone 1 is not ready to launch console proxy
> yet
> 2014-06-19 21:49:01,197 DEBUG [c.c.n.NetworkUsageManagerImpl]
> (AgentConnectTaskPool-342:ctx-b66d3294) Disconnected called on 2 with
> status Alert
> 2014-06-19 21:49:01,197 DEBUG [c.c.a.m.AgentManagerImpl]
> (AgentConnectTaskPool-342:ctx-b66d3294) Sending Disconnect to listener:
> com.cloud.consoleproxy.ConsoleProxyListener
> 2014-06-19 21:49:01,198 DEBUG [c.c.h.Status]
> (AgentConnectTaskPool-342:ctx-b66d3294) Transition:[Resource state =
> Enabled, Agent event = AgentDisconnected, Host id = 2, name =
> host1.cloud.priv]
> 2014-06-19 21:49:01,259 DEBUG [c.c.h.Status]
> (AgentConnectTaskPool-342:ctx-b66d3294) Agent status update: [id = 2; name
> = host1.cloud.priv; old status = Connecting; event = AgentDisconnected; new
> status = Alert; old update count = 404; new update count = 405]
> 2014-06-19 21:49:01,259 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentConnectTaskPool-342:ctx-b66d3294) Notifying other nodes of to
> disconnect
> 2014-06-19 21:49:01,260 DEBUG [c.c.a.m.AgentManagerImpl]
> (AgentConnectTaskPool-342:ctx-b66d3294) Failed to handle host connection:
> com.cloud.utils.exception.CloudRuntimeException: Unable to connect 2
> 2014-06-19 21:49:01,261 DEBUG [c.c.a.m.AgentManagerImpl]
> (AgentConnectTaskPool-342:ctx-b66d3294) Can not send command
> com.cloud.agent.api.ReadyCommand due to Host 2 is not up
>
>
> Host log:
>
> 2014-06-19 21:19:35,890 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> Lost connection to the server. Dealing with the remaining commands...
> 2014-06-19 21:19:40,891 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> Reconnecting...
> 2014-06-19 21:19:40,891 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> Connecting to 10.151.32.51:8250
> 2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> SSL: Handshake done
> 2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> Connected to 10.151.32.51:8250
> 2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Proccess agent startup answer, agent id = 0
> 2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Set agent id 0
> 2014-06-19 21:19:41,085 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Startup Response Received: agent id = 0
> 2014-06-19 21:19:45,990 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> Connected to the server
> 2014-06-19 21:19:46,595 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> Lost connection to the server. Dealing with the remaining commands...
> 2014-06-19 21:19:51,596 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
> Reconnecting...
>
>
>
> I'm using Centos 6.5 and Cloudstack 4.3 with basic networking.
>
> Please help me..
>
>
> On Thu, Jun 19, 2014 at 9:42 PM, Rohit Yadav <bh...@apache.org> wrote:
>
> > I'm not sure what could be the specific issue. You can tail the
> management
> > server logs to see what is failing. After you figure out the specific
> > issue, you may share it with us with your host os, CloudStack version
> > details and the connected host details.
> >
> > Regards.
> >
> >
> > On Thu, Jun 19, 2014 at 8:01 PM, dimas yoga pratama <sm...@gmail.com>
> > wrote:
> >
> > > Hi, from the infrastructure tab I can detect the hosts, but  both of
> the
> > > hosts show alert state., I already try to force reconnect but it fails.
> > > What should I do? Now the CPVM fail to start too.
> > >
> > >
> > > On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav <bh...@apache.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > SSVMs and VRs are stateless so if restarts are not working for you,
> you
> > > may
> > > > (force) stop and remove them. The CloudStack HA thread(s) would
> > kickstart
> > > > new ones after a certain timeout, to speed this behaviour you may
> > restart
> > > > CloudStack as well.
> > > >
> > > > If your problem still persists after trying above you may try
> debugging
> > > the
> > > > issue:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
> > > >
> > > > Regards.
> > > >
> > > >
> > > > On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <
> smidh.d@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > OK this is my problem, after blackout I can''t start virtual
> router,
> > > and
> > > > > ssvm not detected in my cloudstack system. SSVM recreated itself
> but
> > > > stuck
> > > > > in starting state.
> > > > >
> > > > > What should I do?Please help me..
> > > > >
> > > >
> > >
> >
>

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by dimas yoga pratama <sm...@gmail.com>.
management log :

2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
SSL: Fail to find the generated keystore. Loading fail-safe one to continue.

2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
r-71-VM
2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
(AgentConnectTaskPool-344:ctx-07431045) Job is executed without a context,
setup psudo job for the executing thread
2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state but
we're moving on because it's forced stop
2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Monitor
ClusteredVirtualMachineManagerImpl says there is an error in the connect
process for 2 due to Work item not found, We cannot stop
VM[DomainRouter|r-71-VM] when it is in state Starting
com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
        at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
        at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
        at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
        at
com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
        at
com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
        at
com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
        at
com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
        at
com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
        at
com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
        at
com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
        at
com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
        at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
(secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary storage
VM yet
2014-06-19 21:49:08,282 DEBUG [c.c.c.ConsoleProxyManagerImpl]
(consoleproxy-1:ctx-60f4b3c9) Zone 1 is not ready to launch console proxy
yet
2014-06-19 21:49:01,197 DEBUG [c.c.n.NetworkUsageManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Disconnected called on 2 with
status Alert
2014-06-19 21:49:01,197 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Sending Disconnect to listener:
com.cloud.consoleproxy.ConsoleProxyListener
2014-06-19 21:49:01,198 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-342:ctx-b66d3294) Transition:[Resource state =
Enabled, Agent event = AgentDisconnected, Host id = 2, name =
host1.cloud.priv]
2014-06-19 21:49:01,259 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-342:ctx-b66d3294) Agent status update: [id = 2; name
= host1.cloud.priv; old status = Connecting; event = AgentDisconnected; new
status = Alert; old update count = 404; new update count = 405]
2014-06-19 21:49:01,259 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Notifying other nodes of to
disconnect
2014-06-19 21:49:01,260 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Failed to handle host connection:
com.cloud.utils.exception.CloudRuntimeException: Unable to connect 2
2014-06-19 21:49:01,261 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Can not send command
com.cloud.agent.api.ReadyCommand due to Host 2 is not up


Host log:

2014-06-19 21:19:35,890 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
Lost connection to the server. Dealing with the remaining commands...
2014-06-19 21:19:40,891 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
Reconnecting...
2014-06-19 21:19:40,891 INFO  [utils.nio.NioClient] (Agent-Selector:null)
Connecting to 10.151.32.51:8250
2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
SSL: Handshake done
2014-06-19 21:19:40,989 INFO  [utils.nio.NioClient] (Agent-Selector:null)
Connected to 10.151.32.51:8250
2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Proccess agent startup answer, agent id = 0
2014-06-19 21:19:41,084 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Set agent id 0
2014-06-19 21:19:41,085 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Startup Response Received: agent id = 0
2014-06-19 21:19:45,990 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
Connected to the server
2014-06-19 21:19:46,595 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
Lost connection to the server. Dealing with the remaining commands...
2014-06-19 21:19:51,596 INFO  [cloud.agent.Agent] (Agent-Handler-3:null)
Reconnecting...



I'm using Centos 6.5 and Cloudstack 4.3 with basic networking.

Please help me..


On Thu, Jun 19, 2014 at 9:42 PM, Rohit Yadav <bh...@apache.org> wrote:

> I'm not sure what could be the specific issue. You can tail the management
> server logs to see what is failing. After you figure out the specific
> issue, you may share it with us with your host os, CloudStack version
> details and the connected host details.
>
> Regards.
>
>
> On Thu, Jun 19, 2014 at 8:01 PM, dimas yoga pratama <sm...@gmail.com>
> wrote:
>
> > Hi, from the infrastructure tab I can detect the hosts, but  both of the
> > hosts show alert state., I already try to force reconnect but it fails.
> > What should I do? Now the CPVM fail to start too.
> >
> >
> > On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav <bh...@apache.org>
> wrote:
> >
> > > Hi,
> > >
> > > SSVMs and VRs are stateless so if restarts are not working for you, you
> > may
> > > (force) stop and remove them. The CloudStack HA thread(s) would
> kickstart
> > > new ones after a certain timeout, to speed this behaviour you may
> restart
> > > CloudStack as well.
> > >
> > > If your problem still persists after trying above you may try debugging
> > the
> > > issue:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
> > >
> > > Regards.
> > >
> > >
> > > On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <smidh.d@gmail.com
> >
> > > wrote:
> > >
> > > > OK this is my problem, after blackout I can''t start virtual router,
> > and
> > > > ssvm not detected in my cloudstack system. SSVM recreated itself but
> > > stuck
> > > > in starting state.
> > > >
> > > > What should I do?Please help me..
> > > >
> > >
> >
>

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by Rohit Yadav <bh...@apache.org>.
I'm not sure what could be the specific issue. You can tail the management
server logs to see what is failing. After you figure out the specific
issue, you may share it with us with your host os, CloudStack version
details and the connected host details.

Regards.


On Thu, Jun 19, 2014 at 8:01 PM, dimas yoga pratama <sm...@gmail.com>
wrote:

> Hi, from the infrastructure tab I can detect the hosts, but  both of the
> hosts show alert state., I already try to force reconnect but it fails.
> What should I do? Now the CPVM fail to start too.
>
>
> On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav <bh...@apache.org> wrote:
>
> > Hi,
> >
> > SSVMs and VRs are stateless so if restarts are not working for you, you
> may
> > (force) stop and remove them. The CloudStack HA thread(s) would kickstart
> > new ones after a certain timeout, to speed this behaviour you may restart
> > CloudStack as well.
> >
> > If your problem still persists after trying above you may try debugging
> the
> > issue:
> >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
> >
> > Regards.
> >
> >
> > On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <sm...@gmail.com>
> > wrote:
> >
> > > OK this is my problem, after blackout I can''t start virtual router,
> and
> > > ssvm not detected in my cloudstack system. SSVM recreated itself but
> > stuck
> > > in starting state.
> > >
> > > What should I do?Please help me..
> > >
> >
>

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by dimas yoga pratama <sm...@gmail.com>.
Hi, from the infrastructure tab I can detect the hosts, but  both of the
hosts show alert state., I already try to force reconnect but it fails.
What should I do? Now the CPVM fail to start too.


On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav <bh...@apache.org> wrote:

> Hi,
>
> SSVMs and VRs are stateless so if restarts are not working for you, you may
> (force) stop and remove them. The CloudStack HA thread(s) would kickstart
> new ones after a certain timeout, to speed this behaviour you may restart
> CloudStack as well.
>
> If your problem still persists after trying above you may try debugging the
> issue:
>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
>
> Regards.
>
>
> On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <sm...@gmail.com>
> wrote:
>
> > OK this is my problem, after blackout I can''t start virtual router, and
> > ssvm not detected in my cloudstack system. SSVM recreated itself but
> stuck
> > in starting state.
> >
> > What should I do?Please help me..
> >
>

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

Posted by Rohit Yadav <bh...@apache.org>.
Hi,

SSVMs and VRs are stateless so if restarts are not working for you, you may
(force) stop and remove them. The CloudStack HA thread(s) would kickstart
new ones after a certain timeout, to speed this behaviour you may restart
CloudStack as well.

If your problem still persists after trying above you may try debugging the
issue:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting

Regards.


On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama <sm...@gmail.com>
wrote:

> OK this is my problem, after blackout I can''t start virtual router, and
> ssvm not detected in my cloudstack system. SSVM recreated itself but stuck
> in starting state.
>
> What should I do?Please help me..
>