You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Carlos Reategui <ca...@reategui.com> on 2013/10/29 05:05:44 UTC

Management Server won't connect after cluster shutdown and restart

Using CS 4.1.1 with 2 hosts running XS 6.0.2

Had to shut everything down and now I am having problems bringing things up.

As suggested I used CS to stop all my instances as well as the system VMs
and the SR. Then I shutdown the XS 6.02 servers after enabling maintenance
mode from the CS console.

After bringing things up, my XS servers had the infamous interface-rename
issue which I resolved by editing the udev rules file manually.

Now I have my XS servers up but for some reason my pool master got changed
so I used xe pool-designate-new-master to switch it back.

I did not notice that this designation change had been picked up by CS and
when starting it up it keeps trying to connect to the wrong pool master.
 Should I switch XS to match CS or what do I need to change in CS to tell
it what the pool master is?

I tried putting the server that CS thinks is the master in maintenance mode
from CS but that just ends up in an apparent infinite cycle spitting out
endless lines like these:

2013-10-28 20:39:02,059 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq 2-855048230:
{ Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags

: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,060 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag

s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,062 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag

s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,063 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq 2-855048230:
{ Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags

: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,064 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag

s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,066 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag

s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,067 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq 2-855048230:
{ Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags

: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

2013-10-28 20:39:02,068 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag

s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255

After stopping and restarting the MS, the first error I see is:

2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet] (catalina-exec-1:null)
===START===  10.110.3.70 -- GET
command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&sessi

onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624

2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet] (catalina-exec-1:null)
unknown exception writing api response

java.lang.NullPointerException

        at
com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:280)

        at
com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:143)

        at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)

        at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)

        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

        at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:615)

        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

        at
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)

        at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)

        at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet] (catalina-exec-1:null)
===END===  10.110.3.70 -- GET
command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&session

key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624

Then I see a few of these:

2013-10-28 20:42:01,464 WARN  [agent.manager.ClusteredAgentManagerImpl]
(HA-Worker-4:work-10) Unable to connect to peer management server:
233845174730255, ip: 172.30.45.2 due to Connection refused

java.net.ConnectException: Connection refused

        at sun.nio.ch.Net.connect(Native Method)

        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)

        at java.nio.channels.SocketChannel.open(SocketChannel.java:164)

        at
com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)

        at
com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)

        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)

        at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)

2013-10-28 20:42:01,468 WARN  [agent.manager.ClusteredAgentManagerImpl]
(HA-Worker-2:work-11) Unable to connect to peer management server:
233845174730255, ip: 172.30.45.2 due to Connection refused

java.net.ConnectException: Connection refused

        at sun.nio.ch.Net.connect(Native Method)

        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)

        at java.nio.channels.SocketChannel.open(SocketChannel.java:164)

        at
com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)

        at
com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)

        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)

        at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)


The next error is:

2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
(AgentManager-Handler-6:null) Caught the following exception but pushing on

java.lang.NullPointerException

        at
com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes.java:231)

        at
com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java:150)

        at
com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclusionStrategy.java:38)

        at
com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(DisjunctionExclusionStrategy.java:38)

        at
com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:58)

        at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)

        at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)

        at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)

        at com.google.gson.Gson.toJsonTree(Gson.java:220)

        at com.google.gson.Gson.toJsonTree(Gson.java:197)

        at
com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:56)

        at
com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:37)

        at
com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer(JsonSerializationVisitor.java:184)

        at
com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:160)

        at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)

        at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)

        at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)

        at com.google.gson.Gson.toJsonTree(Gson.java:220)

        at com.google.gson.Gson.toJson(Gson.java:260)

        at com.cloud.agent.transport.Request.toBytes(Request.java:316)

        at com.cloud.agent.transport.Request.getBytes(Request.java:332)

        at
com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgentManagerImpl.java:435)

        at
com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandler.doTask(ClusteredAgentManagerImpl.java:641)

        at com.cloud.utils.nio.Task.run(Task.java:83)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

and then the next set of errors I see over and over are:

2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
(StatsCollector-2:null) Unable to send storage pool command to
Pool[200|LVM] via 1

com.cloud.exception.OperationTimedoutException: Commands 1112277002 to Host
1 timed out after 3600

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
(StatsCollector-2:null) Unable to reach Pool[200|LVM]

com.cloud.exception.StorageUnavailableException: Resource [StoragePool:200]
is unreachable: Unable to send command to the pool

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

I have tried to force reconnect to both hosts but that ends up maxing out a
CPU core and filling up the log file with endless log lines.

Any thoughts on how to recover my system?

Re: Management Server won't connect after cluster shutdown and restart

Posted by Ian Duffy <ia...@ianduffy.ie>.
Ilya,

My case wasn't generic/a cloudstack fault in the end (manual editing of the
database had occurred putting things into an invalid state.).

The others on this thread might be able to provide you with information
about your issues. I found bumping the log level up to trace provided a
much greater insight.


On 30 August 2014 19:12, ilya musayev <il...@gmail.com> wrote:

> Can you tell us more please.
>
> In my rather large environments, I may need to do several restarts for
> cloudstack to come up properly.
>
> Otherwise it complains that SSVM and CPVM are not ready to launch in Zone
> X.
>
> Thanks
> ilya
>
> On 8/30/14, 5:29 AM, Ian Duffy wrote:
>
>> Hi All,
>>
>> Thank you very much for the help.
>>
>> Ended up solving the issue. There was an invalid value in our
>> configuration
>> table which seemed to prevent a lot of DAOs from being autowired.
>>
>>
>>
>>
>> On 29 August 2014 21:16, Paul Angus <pa...@shapeblue.com> wrote:
>>
>>  Hi Ian,
>>>
>>> I've seen this kind of behaviour before with KVM hosts reconnecting.
>>>
>>> There’s a select …. WITH UPDATE; query on the op_ha_work table which
>>> locks
>>> the table, stopping other hosts updating their status. If there are a lot
>>> of entries in there they all lock each other out. Deleting the entries
>>> fixed the problem, but you have to deal with hosts and vms being up/down
>>> yourself.
>>>
>>> So check the op_ha_work table for lots of entries which can lock up the
>>> database. If you can check the database for the queries that it's
>>> handling
>>> - that would be best.
>>>
>>> Also check that the management server and MySQL DB is tuned for the load
>>> that being thrown at it.
>>> (http://support.citrix.com/article/CTX132020)
>>> Remember if you have other services such as Nagios or puppet/chef
>>> directly
>>> reading the DB, that adds to the number of connections into the mysql db
>>> -
>>> I have seen the management server starved of mysql connections when a lot
>>> of hosts are brought back online.
>>>
>>>
>>> Regards
>>>
>>> Paul Angus
>>> Cloud Architect
>>> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
>>> paul.angus@shapeblue.com
>>>
>>> -----Original Message-----
>>> From: creategui@gmail.com [mailto:creategui@gmail.com] On Behalf Of
>>> Carlos Reategui
>>> Sent: 29 August 2014 20:55
>>> To: users@cloudstack.apache.org
>>> Subject: Re: Management Server won't connect after cluster shutdown and
>>> restart
>>>
>>> Hi Ian,
>>>
>>> So the root of the problem was that the machines where not started up in
>>> the correct order.
>>>
>>> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
>>> VM hosts.  On the other end the hosts needed to be brought up first and
>>> once they are ok then bring up the CS machine and make sure everything
>>> was
>>> in the same state it thought things were when it was shutdown.
>>>   Unfortunately CS came up before everything else was the way it expected
>>> it to be and I did not realize that at the time.
>>>
>>> To resolve I went back to my CS db backup from right after I shut it down
>>> the MS, made sure the VM hosts were all as expected and then started the
>>> MS.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:
>>>
>>>  Hi carlos,
>>>>
>>>> Did you ever find a fix for this?
>>>>
>>>> I'm seeing a same issue on 4.1.1 with Vmware ESXi.
>>>>
>>>>
>>>> On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
>>>>
>>>>  Update.  I cleared out the async_job table and also reset the system
>>>>> vms
>>>>>
>>>> it
>>>>
>>>>> thought where in starting mode from my previous attempts by setting
>>>>> them
>>>>>
>>>> to
>>>>
>>>>> Stopped from starting.  I also re-set the XS pool master to be the
>>>>> one XS thinks it is.
>>>>>
>>>>> Now when I start the CS MS here are the logs leading up to the first
>>>>> exception about the Unable to reach the pool:
>>>>>
>>>>> 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
>>>>> (Cluster-Notification-1:null) Management server node 172.30.45.2 is
>>>>> up, send alert
>>>>>
>>>>> 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
>>>>> (Cluster-Notification-1:null) Notifying management server join event
>>>>>
>>>> took 9
>>>>
>>>>> ms
>>>>>
>>>>> 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
>>>>> (StatsCollector-2:null) HostStatsCollector is running...
>>>>>
>>>>> 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
>>>>> (StatsCollector-3:null) VmStatsCollector is running...
>>>>>
>>>>> 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
>>>>> (StatsCollector-1:null) StorageCollector is running...
>>>>>
>>>>> 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
>>>>> (StatsCollector-1:null) There is no secondary storage VM for
>>>>> secondary storage host nfs://172.30.45.2/store/secondary
>>>>>
>>>>> 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
>>>>>
>>>> 233845174730255
>>>>
>>>>> 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
>>>>>
>>>> 233845174730253
>>>>
>>>>> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,275 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
>>>>>
>>>> Req:
>>>>
>>>>> Resource [Host:1] is unreachable: Host 1: Link is c
>>>>>
>>>>> losed
>>>>>
>>>>> 2013-10-28 21:27:23,275 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>>> Routing to peer
>>>>>
>>>>> 2013-10-28 21:27:23,277 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>>> Cancel request received
>>>>>
>>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>>> (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
>>>>> (StatsCollector-2:null) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>>> for exceptions
>>>>>
>>>>> 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 1-201916421: Timed out on null
>>>>>
>>>>> 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 1-201916421: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
>>>>> (StatsCollector-2:null) Operation timed out: Commands 201916421 to
>>>>> Host 1 timed out after 3600
>>>>>
>>>>> 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
>>>>> (StatsCollector-2:null) Unable to obtain host 1 statistics.
>>>>>
>>>>> 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
>>>>> (StatsCollector-2:null) Received invalid host stats for host: 1
>>>>>
>>>>> 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
>>>>>
>>>> 233845174730255
>>>>
>>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
>>>>> 233845174730253
>>>>>
>>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,283 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
>>>>> Req: Resource [Host:1] is unreachable: Host 1: Link is
>>>>>
>>>>> closed
>>>>>
>>>>> 2013-10-28 21:27:23,284 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>>> Routing to peer
>>>>>
>>>>> 2013-10-28 21:27:23,286 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>>> Cancel request received
>>>>>
>>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>>> (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
>>>>> (StatsCollector-1:null) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>>> for exceptions
>>>>>
>>>>> 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 1-201916422: Timed out on null
>>>>>
>>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 1-201916422: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
>>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>>> Pool[200|LVM] via 1
>>>>>
>>>>> com.cloud.exception.OperationTimedoutException: Commands 201916422
>>>>> to
>>>>>
>>>> Host
>>>>
>>>>> 1 timed out after 3600
>>>>>
>>>>>          at
>>>>>
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 511)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 464)
>>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2347)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>>> 71)
>>>>>
>>>>>          at
>>>>>
>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
>>>>> (StatsCollector-1:null) Unable to reach Pool[200|LVM]
>>>>>
>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>>
>>>> [StoragePool:200]
>>>>
>>>>> is unreachable: Unable to send command to the pool
>>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2357)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>>> 71)
>>>>>
>>>>>          at
>>>>>
>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
>>>>> 233845174730255
>>>>>
>>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
>>>>> 233845174730253
>>>>>
>>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,302 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId
>>>>>
>>>> 233845174730253:
>>>
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,302 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>> Routing to peer
>>>>>
>>>>> 2013-10-28 21:27:23,303 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>> Cancel request received
>>>>>
>>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>>> (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
>>>>> (StatsCollector-2:null) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>>> for exceptions
>>>>>
>>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
>>>>>
>>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
>>>>> (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
>>>>> Host
>>>>>
>>>> 2
>>>>
>>>>> timed out after 3600
>>>>>
>>>>> 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
>>>>> (StatsCollector-2:null) Unable to obtain host 2 statistics.
>>>>>
>>>>> 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
>>>>> (StatsCollector-2:null) Received invalid host stats for host: 2
>>>>>
>>>>> 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
>>>>> 233845174730255
>>>>>
>>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
>>>>> 233845174730253
>>>>>
>>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,308 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
>>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,308 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>> Routing to peer
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>>
>>>> Cancel
>>>>
>>>>> request received
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>>> (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
>>>>> (StatsCollector-1:null) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>>> for exceptions
>>>>>
>>>>> 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
>>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>>> Pool[201|LVM] via 2
>>>>>
>>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703497
>>>>> to
>>>>>
>>>> Host
>>>>
>>>>> 2 timed out after 3600
>>>>>
>>>>>          at
>>>>>
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 511)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 464)
>>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2347)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>>> 71)
>>>>>
>>>>>          at
>>>>>
>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
>>>>> (StatsCollector-1:null) Unable to reach Pool[201|LVM]
>>>>>
>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>>
>>>> [StoragePool:201]
>>>>
>>>>> is unreachable: Unable to send command to the pool
>>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2357)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>>> 71)
>>>>>
>>>>>          at
>>>>>
>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
>>>>> 233845174730255
>>>>>
>>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
>>>>> 233845174730253
>>>>>
>>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,329 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
>>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>>
>>>>> 2013-10-28 21:27:23,330 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>> Routing to peer
>>>>>
>>>>> 2013-10-28 21:27:23,331 DEBUG
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>>>
>>>> Cancel
>>>>
>>>>> request received
>>>>>
>>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>>> (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>>> because this is the current command
>>>>>
>>>>> 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
>>>>> (StatsCollector-1:null) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>>> for exceptions
>>>>>
>>>>> 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
>>>>>
>>>>> 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
>>>>> (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
>>>>>
>>>>> 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
>>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>>> Pool[202|NetworkFilesystem] via 2
>>>>>
>>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703498
>>>>> to
>>>>>
>>>> Host
>>>>
>>>>> 2 timed out after 3600
>>>>>
>>>>>          at
>>>>>
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 511)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>>> 464)
>>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2347)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>>> 71)
>>>>>
>>>>>          at
>>>>>
>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> IP tables is disable on the XS hosts so the connection prob is not a
>>>>> firewall issue.
>>>>>
>>>>> If I do an xe se-list I see all 3 of the above SRs and the hosts
>>>>> have mounted the NFS SR and can access it.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
>>>>> <carlos@reategui.com
>>>>>
>>>>>> wrote:
>>>>>> Using CS 4.1.1 with 2 hosts running XS 6.0.2
>>>>>>
>>>>>> Had to shut everything down and now I am having problems bringing
>>>>>>
>>>>> things
>>>>
>>>>> up.
>>>>>>
>>>>>> As suggested I used CS to stop all my instances as well as the
>>>>>> system
>>>>>>
>>>>> VMs
>>>>
>>>>> and the SR. Then I shutdown the XS 6.02 servers after enabling
>>>>>>
>>>>> maintenance
>>>>>
>>>>>> mode from the CS console.
>>>>>>
>>>>>> After bringing things up, my XS servers had the infamous
>>>>>>
>>>>> interface-rename
>>>>
>>>>> issue which I resolved by editing the udev rules file manually.
>>>>>>
>>>>>> Now I have my XS servers up but for some reason my pool master got
>>>>>>
>>>>> changed
>>>>>
>>>>>> so I used xe pool-designate-new-master to switch it back.
>>>>>>
>>>>>> I did not notice that this designation change had been picked up
>>>>>> by CS
>>>>>>
>>>>> and
>>>>>
>>>>>> when starting it up it keeps trying to connect to the wrong pool
>>>>>>
>>>>> master.
>>>>
>>>>>   Should I switch XS to match CS or what do I need to change in CS
>>>>>> to
>>>>>>
>>>>> tell
>>>>
>>>>> it what the pool master is?
>>>>>>
>>>>>> I tried putting the server that CS thinks is the master in
>>>>>> maintenance mode from CS but that just ends up in an apparent
>>>>>> infinite cycle
>>>>>>
>>>>> spitting
>>>>
>>>>> out endless lines like these:
>>>>>>
>>>>>> 2013-10-28 20:39:02,059 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
>>>>>>
>>>>> 2-855048230:
>>>>>
>>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>>
>>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,060 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
>>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>>> Flag
>>>>>>
>>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,062 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
>>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>>> Flag
>>>>>>
>>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,063 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
>>>>>>
>>>>> 2-855048230:
>>>>>
>>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>>
>>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,064 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
>>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>>> Flag
>>>>>>
>>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,066 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
>>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>>> Flag
>>>>>>
>>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,067 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
>>>>>>
>>>>> 2-855048230:
>>>>>
>>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>>
>>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> 2013-10-28 20:39:02,068 DEBUG
>>>>>> [agent.manager.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
>>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>>> Flag
>>>>>>
>>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>>
>>>>>> After stopping and restarting the MS, the first error I see is:
>>>>>>
>>>>>> 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
>>>>>> (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
>>>>>>
>>>>>>  command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-
>>>> 54e5ed236f88
>>>> &response=json&sessi
>>>>
>>>>> onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>>
>>>>>> 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
>>>>>> (catalina-exec-1:null) unknown exception writing api response
>>>>>>
>>>>>> java.lang.NullPointerException
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.user.AccountManagerImpl.getSystemUser(
>>>> AccountManagerImpl.jav
>>>> a:280)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.user.AccountManagerImpl.getSystemUser(
>>>> AccountManagerImpl.jav
>>>> a:143)
>>>>
>>>>>          at
>>>>>> com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
>>>>>>
>>>>>>          at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
>>>>>>
>>>>>>          at
>>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>>>>
>>>>>>          at
>>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.ApplicationFilterChain.
>>>> internalDoFilter(Appli
>>>> cationFilterChain.java:290)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.ApplicationFilterChain.
>>>> doFilter(ApplicationFi
>>>> lterChain.java:206)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.StandardWrapperValve.invoke(
>>>> StandardWrapperVa
>>>> lve.java:233)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.StandardContextValve.invoke(
>>>> StandardContextVa
>>>> lve.java:191)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.StandardHostValve.invoke(
>>>> StandardHostValve.ja
>>>> va:127)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.valves.ErrorReportValve.invoke(
>>>> ErrorReportValve.ja
>>>> va:102)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.valves.AccessLogValve.invoke(
>>>> AccessLogValve.java:6
>>>> 15)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.core.StandardEngineValve.invoke(
>>>> StandardEngineValv
>>>> e.java:109)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.catalina.connector.CoyoteAdapter.service(
>>>> CoyoteAdapter.java
>>>> :293)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.coyote.http11.Http11NioProcessor.process(
>>>> Http11NioProcessor
>>>> .java:889)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.coyote.http11.Http11NioProtocol$
>>>> Http11ConnectionHandler.pro
>>>> cess(Http11NioProtocol.java:744)
>>>>
>>>>>          at
>>>>>>
>>>>>>  org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
>>>> run(NioEndpoint
>>>> .java:2282)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>>
>>>>>> 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
>>>>>> (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
>>>>>>
>>>>>>  command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-
>>>> 54e5ed236f88
>>>> &response=json&session
>>>>
>>>>> key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>>
>>>>>> Then I see a few of these:
>>>>>>
>>>>>> 2013-10-28 20:42:01,464 WARN
>>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>>> (HA-Worker-4:work-10) Unable to connect to peer management server:
>>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>>
>>>>>> java.net.ConnectException: Connection refused
>>>>>>
>>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>>
>>>>>>          at
>>>>>>
>>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>
>>>>>>          at
>>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentManagerImpl.
>>>> connectToPeer(Cluste
>>>> redAgentManagerImpl.java:477)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentAttache.send(
>>>> ClusteredAgentAttac
>>>> he.java:172)
>>>>
>>>>>          at
>>>>>>
>>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:51
>>>> 1)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:46
>>>> 4)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(
>>>> CheckOnAgentInvestigat
>>>> or.java:53)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.HighAvailabilityManagerImpl.
>>>> restart(HighAvailabilityManag
>>>> erImpl.java:434)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.HighAvailabilityManagerImpl$
>>>> WorkerThread.run(HighAvailabi
>>>> lityManagerImpl.java:829)
>>>>
>>>>> 2013-10-28 20:42:01,468 WARN
>>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>>> (HA-Worker-2:work-11) Unable to connect to peer management server:
>>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>>
>>>>>> java.net.ConnectException: Connection refused
>>>>>>
>>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>>
>>>>>>          at
>>>>>>
>>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>
>>>>>>          at
>>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentManagerImpl.
>>>> connectToPeer(Cluste
>>>> redAgentManagerImpl.java:477)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentAttache.send(
>>>> ClusteredAgentAttac
>>>> he.java:172)
>>>>
>>>>>          at
>>>>>>
>>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:51
>>>> 1)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:46
>>>> 4)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(
>>>> CheckOnAgentInvestigat
>>>> or.java:53)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.HighAvailabilityManagerImpl.
>>>> restart(HighAvailabilityManag
>>>> erImpl.java:434)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.ha.HighAvailabilityManagerImpl$
>>>> WorkerThread.run(HighAvailabi
>>>> lityManagerImpl.java:829)
>>>>
>>>>>
>>>>>> The next error is:
>>>>>>
>>>>>> 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
>>>>>> (AgentManager-Handler-6:null) Caught the following exception but
>>>>>>
>>>>> pushing
>>>>
>>>>> on
>>>>>
>>>>>> java.lang.NullPointerException
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.google.gson.FieldAttributes.getAnnotationFromArray(
>>>> FieldAttributes
>>>> .java:231)
>>>>
>>>>>          at
>>>>>> com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
>>>>>> :150)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.google.gson.VersionExclusionStrategy.
>>>> shouldSkipField(VersionExclus
>>>> ionStrategy.java:38)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.DisjunctionExclusionStrategy.
>>>> shouldSkipField(Disjuncti
>>>> onExclusionStrategy.java:38)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.ReflectingFieldNavigator.
>>>> visitFieldsReflectively(Refle
>>>> ctingFieldNavigator.java:58)
>>>>
>>>>>          at
>>>>>>
>>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationContextDefaul
>>>> t.serialize(JsonSerializ
>>>> ationContextDefault.java:62)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationContextDefaul
>>>> t.serialize(JsonSerializ
>>>> ationContextDefault.java:53)
>>>>
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>>
>>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:197)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.transport.ArrayTypeAdaptor.serialize(
>>>> ArrayTypeAdaptor.
>>>> java:56)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.transport.ArrayTypeAdaptor.serialize(
>>>> ArrayTypeAdaptor.
>>>> java:37)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationVisitor.
>>>> findAndInvokeCustomSerializer
>>>> (JsonSerializationVisitor.java:184)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationVisitor.
>>>> visitUsingCustomHandler(JsonS
>>>> erializationVisitor.java:160)
>>>>
>>>>>          at
>>>>>>
>>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationContextDefaul
>>>> t.serialize(JsonSerializ
>>>> ationContextDefault.java:62)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.google.gson.JsonSerializationContextDefaul
>>>> t.serialize(JsonSerializ
>>>> ationContextDefault.java:53)
>>>>
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>>
>>>>>>          at com.google.gson.Gson.toJson(Gson.java:260)
>>>>>>
>>>>>>          at
>>>>>> com.cloud.agent.transport.Request.toBytes(Request.java:316)
>>>>>>
>>>>>>          at
>>>>>> com.cloud.agent.transport.Request.getBytes(Request.java:332)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentManagerImpl.
>>>> cancel(ClusteredAgen
>>>> tManagerImpl.java:435)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.ClusteredAgentManagerImpl$
>>>> ClusteredAgentHandle
>>>> r.doTask(ClusteredAgentManagerImpl.java:641)
>>>>
>>>>>          at com.cloud.utils.nio.Task.run(Task.java:83)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>>
>>>>>> and then the next set of errors I see over and over are:
>>>>>>
>>>>>> 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
>>>>>> (StatsCollector-2:null) Unable to send storage pool command to
>>>>>> Pool[200|LVM] via 1
>>>>>>
>>>>>> com.cloud.exception.OperationTimedoutException: Commands
>>>>>> 1112277002 to Host 1 timed out after 3600
>>>>>>
>>>>>>          at
>>>>>>
>>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:51
>>>> 1)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.agent.manager.AgentManagerImpl.send(
>>>> AgentManagerImpl.java:46
>>>> 4)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2347)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>>> :471)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>>
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>>
>>>>>> 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
>>>>>> (StatsCollector-2:null) Unable to reach Pool[200|LVM]
>>>>>>
>>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>>> [StoragePool:200] is unreachable: Unable to send command to the
>>>>>> pool
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:2357)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:422)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.storage.StorageManagerImpl.sendToPool(
>>>> StorageManagerImpl.jav
>>>> a:436)
>>>>
>>>>>          at
>>>>>>
>>>>>>  com.cloud.server.StatsCollector$StorageCollector.run(
>>>> StatsCollector.ja
>>>> va:316)
>>>>
>>>>>          at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>>> :471)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.
>>>> java:
>>>> 351)
>>>>
>>>>>          at
>>>>>>
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>
>>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.a
>>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.r
>>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.j
>>>> ava:1146)
>>>>
>>>>>          at
>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.
>>>> java:615)
>>>>
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>>
>>>>>> I have tried to force reconnect to both hosts but that ends up
>>>>>> maxing
>>>>>>
>>>>> out
>>>>
>>>>> a CPU core and filling up the log file with endless log lines.
>>>>>>
>>>>>> Any thoughts on how to recover my system?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Find out more about ShapeBlue and our range of CloudStack related
>>> services
>>>
>>> IaaS Cloud Design & Build<
>>> http://shapeblue.com/iaas-cloud-design-and-build//>
>>> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
>>> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
>>> CloudStack Infrastructure Support<
>>> http://shapeblue.com/cloudstack-infrastructure-support/>
>>> CloudStack Bootcamp Training Courses<
>>> http://shapeblue.com/cloudstack-training/>
>>>
>>> This email and any attachments to it may be confidential and are intended
>>> solely for the use of the individual to whom it is addressed. Any views
>>> or
>>> opinions expressed are solely those of the author and do not necessarily
>>> represent those of Shape Blue Ltd or related companies. If you are not
>>> the
>>> intended recipient of this email, you must neither take any action based
>>> upon its contents, nor copy or show it to anyone. Please contact the
>>> sender
>>> if you believe you have received this email in error. Shape Blue Ltd is a
>>> company incorporated in England & Wales. ShapeBlue Services India LLP is
>>> a
>>> company incorporated in India and is operated under license from Shape
>>> Blue
>>> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in
>>> Brasil
>>> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd
>>> is
>>> a company registered by The Republic of South Africa and is traded under
>>> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>>>
>>>
>

Re: Management Server won't connect after cluster shutdown and restart

Posted by ilya musayev <il...@gmail.com>.
Can you tell us more please.

In my rather large environments, I may need to do several restarts for 
cloudstack to come up properly.

Otherwise it complains that SSVM and CPVM are not ready to launch in Zone X.

Thanks
ilya
On 8/30/14, 5:29 AM, Ian Duffy wrote:
> Hi All,
>
> Thank you very much for the help.
>
> Ended up solving the issue. There was an invalid value in our configuration
> table which seemed to prevent a lot of DAOs from being autowired.
>
>
>
>
> On 29 August 2014 21:16, Paul Angus <pa...@shapeblue.com> wrote:
>
>> Hi Ian,
>>
>> I've seen this kind of behaviour before with KVM hosts reconnecting.
>>
>> There’s a select …. WITH UPDATE; query on the op_ha_work table which locks
>> the table, stopping other hosts updating their status. If there are a lot
>> of entries in there they all lock each other out. Deleting the entries
>> fixed the problem, but you have to deal with hosts and vms being up/down
>> yourself.
>>
>> So check the op_ha_work table for lots of entries which can lock up the
>> database. If you can check the database for the queries that it's handling
>> - that would be best.
>>
>> Also check that the management server and MySQL DB is tuned for the load
>> that being thrown at it.
>> (http://support.citrix.com/article/CTX132020)
>> Remember if you have other services such as Nagios or puppet/chef directly
>> reading the DB, that adds to the number of connections into the mysql db -
>> I have seen the management server starved of mysql connections when a lot
>> of hosts are brought back online.
>>
>>
>> Regards
>>
>> Paul Angus
>> Cloud Architect
>> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
>> paul.angus@shapeblue.com
>>
>> -----Original Message-----
>> From: creategui@gmail.com [mailto:creategui@gmail.com] On Behalf Of
>> Carlos Reategui
>> Sent: 29 August 2014 20:55
>> To: users@cloudstack.apache.org
>> Subject: Re: Management Server won't connect after cluster shutdown and
>> restart
>>
>> Hi Ian,
>>
>> So the root of the problem was that the machines where not started up in
>> the correct order.
>>
>> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
>> VM hosts.  On the other end the hosts needed to be brought up first and
>> once they are ok then bring up the CS machine and make sure everything was
>> in the same state it thought things were when it was shutdown.
>>   Unfortunately CS came up before everything else was the way it expected
>> it to be and I did not realize that at the time.
>>
>> To resolve I went back to my CS db backup from right after I shut it down
>> the MS, made sure the VM hosts were all as expected and then started the MS.
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:
>>
>>> Hi carlos,
>>>
>>> Did you ever find a fix for this?
>>>
>>> I'm seeing a same issue on 4.1.1 with Vmware ESXi.
>>>
>>>
>>> On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
>>>
>>>> Update.  I cleared out the async_job table and also reset the system
>>>> vms
>>> it
>>>> thought where in starting mode from my previous attempts by setting
>>>> them
>>> to
>>>> Stopped from starting.  I also re-set the XS pool master to be the
>>>> one XS thinks it is.
>>>>
>>>> Now when I start the CS MS here are the logs leading up to the first
>>>> exception about the Unable to reach the pool:
>>>>
>>>> 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
>>>> (Cluster-Notification-1:null) Management server node 172.30.45.2 is
>>>> up, send alert
>>>>
>>>> 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
>>>> (Cluster-Notification-1:null) Notifying management server join event
>>> took 9
>>>> ms
>>>>
>>>> 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) HostStatsCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-3:null) VmStatsCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) StorageCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) There is no secondary storage VM for
>>>> secondary storage host nfs://172.30.45.2/store/secondary
>>>>
>>>> 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
>>> 233845174730255
>>>> 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
>>> 233845174730253
>>>> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,275 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
>>> Req:
>>>> Resource [Host:1] is unreachable: Host 1: Link is c
>>>>
>>>> losed
>>>>
>>>> 2013-10-28 21:27:23,275 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-2:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
>>>> (StatsCollector-2:null) Operation timed out: Commands 201916421 to
>>>> Host 1 timed out after 3600
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (StatsCollector-2:null) Unable to obtain host 1 statistics.
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) Received invalid host stats for host: 1
>>>>
>>>> 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
>>> 233845174730255
>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,283 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
>>>> Req: Resource [Host:1] is unreachable: Host 1: Link is
>>>>
>>>> closed
>>>>
>>>> 2013-10-28 21:27:23,284 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[200|LVM] via 1
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 201916422
>>>> to
>>> Host
>>>> 1 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) Unable to reach Pool[200|LVM]
>>>>
>>>> com.cloud.exception.StorageUnavailableException: Resource
>>> [StoragePool:200]
>>>> is unreachable: Unable to send command to the pool
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,302 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId
>> 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,302 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-2:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
>>>> (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
>>>> Host
>>> 2
>>>> timed out after 3600
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (StatsCollector-2:null) Unable to obtain host 2 statistics.
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) Received invalid host stats for host: 2
>>>>
>>>> 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
>>> Cancel
>>>> request received
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[201|LVM] via 2
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703497
>>>> to
>>> Host
>>>> 2 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) Unable to reach Pool[201|LVM]
>>>>
>>>> com.cloud.exception.StorageUnavailableException: Resource
>>> [StoragePool:201]
>>>> is unreachable: Unable to send command to the pool
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,330 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
>>> Cancel
>>>> request received
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[202|NetworkFilesystem] via 2
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703498
>>>> to
>>> Host
>>>> 2 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> IP tables is disable on the XS hosts so the connection prob is not a
>>>> firewall issue.
>>>>
>>>> If I do an xe se-list I see all 3 of the above SRs and the hosts
>>>> have mounted the NFS SR and can access it.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
>>>> <carlos@reategui.com
>>>>> wrote:
>>>>> Using CS 4.1.1 with 2 hosts running XS 6.0.2
>>>>>
>>>>> Had to shut everything down and now I am having problems bringing
>>> things
>>>>> up.
>>>>>
>>>>> As suggested I used CS to stop all my instances as well as the
>>>>> system
>>> VMs
>>>>> and the SR. Then I shutdown the XS 6.02 servers after enabling
>>>> maintenance
>>>>> mode from the CS console.
>>>>>
>>>>> After bringing things up, my XS servers had the infamous
>>> interface-rename
>>>>> issue which I resolved by editing the udev rules file manually.
>>>>>
>>>>> Now I have my XS servers up but for some reason my pool master got
>>>> changed
>>>>> so I used xe pool-designate-new-master to switch it back.
>>>>>
>>>>> I did not notice that this designation change had been picked up
>>>>> by CS
>>>> and
>>>>> when starting it up it keeps trying to connect to the wrong pool
>>> master.
>>>>>   Should I switch XS to match CS or what do I need to change in CS
>>>>> to
>>> tell
>>>>> it what the pool master is?
>>>>>
>>>>> I tried putting the server that CS thinks is the master in
>>>>> maintenance mode from CS but that just ends up in an apparent
>>>>> infinite cycle
>>> spitting
>>>>> out endless lines like these:
>>>>>
>>>>> 2013-10-28 20:39:02,059 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,060 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,062 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,063 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,064 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,066 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,067 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,068 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> After stopping and restarting the MS, the first error I see is:
>>>>>
>>>>> 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
>>>>>
>>> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
>>> &response=json&sessi
>>>>> onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>
>>>>> 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) unknown exception writing api response
>>>>>
>>>>> java.lang.NullPointerException
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
>>> a:280)
>>>>>          at
>>>>>
>>> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
>>> a:143)
>>>>>          at
>>>>> com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
>>>>>
>>>>>          at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
>>>>>
>>>>>          at
>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>>>
>>>>>          at
>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>>
>>>>>          at
>>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
>>> cationFilterChain.java:290)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
>>> lterChain.java:206)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
>>> lve.java:233)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
>>> lve.java:191)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
>>> va:127)
>>>>>          at
>>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
>>> va:102)
>>>>>          at
>>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6
>>> 15)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
>>> e.java:109)
>>>>>          at
>>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
>>> :293)
>>>>>          at
>>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor
>>> .java:889)
>>>>>          at
>>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro
>>> cess(Http11NioProtocol.java:744)
>>>>>          at
>>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint
>>> .java:2282)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
>>>>>
>>> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
>>> &response=json&session
>>>>> key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>
>>>>> Then I see a few of these:
>>>>>
>>>>> 2013-10-28 20:42:01,464 WARN
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (HA-Worker-4:work-10) Unable to connect to peer management server:
>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>
>>>>>          at
>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>          at
>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
>>> redAgentManagerImpl.java:477)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
>>> he.java:172)
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
>>> or.java:53)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
>>> erImpl.java:434)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
>>> lityManagerImpl.java:829)
>>>>> 2013-10-28 20:42:01,468 WARN
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (HA-Worker-2:work-11) Unable to connect to peer management server:
>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>
>>>>>          at
>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>          at
>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
>>> redAgentManagerImpl.java:477)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
>>> he.java:172)
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
>>> or.java:53)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
>>> erImpl.java:434)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
>>> lityManagerImpl.java:829)
>>>>>
>>>>> The next error is:
>>>>>
>>>>> 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
>>>>> (AgentManager-Handler-6:null) Caught the following exception but
>>> pushing
>>>> on
>>>>> java.lang.NullPointerException
>>>>>
>>>>>          at
>>>>>
>>> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes
>>> .java:231)
>>>>>          at
>>>>> com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
>>>>> :150)
>>>>>
>>>>>          at
>>>>>
>>> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus
>>> ionStrategy.java:38)
>>>>>          at
>>>>>
>>> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti
>>> onExclusionStrategy.java:38)
>>>>>          at
>>>>>
>>> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle
>>> ctingFieldNavigator.java:58)
>>>>>          at
>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:62)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:53)
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:197)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
>>> java:56)
>>>>>          at
>>>>>
>>> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
>>> java:37)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer
>>> (JsonSerializationVisitor.java:184)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS
>>> erializationVisitor.java:160)
>>>>>          at
>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:62)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:53)
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>
>>>>>          at com.google.gson.Gson.toJson(Gson.java:260)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.transport.Request.toBytes(Request.java:316)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.transport.Request.getBytes(Request.java:332)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen
>>> tManagerImpl.java:435)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle
>>> r.doTask(ClusteredAgentManagerImpl.java:641)
>>>>>          at com.cloud.utils.nio.Task.run(Task.java:83)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> and then the next set of errors I see over and over are:
>>>>>
>>>>> 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
>>>>> (StatsCollector-2:null) Unable to send storage pool command to
>>>>> Pool[200|LVM] via 1
>>>>>
>>>>> com.cloud.exception.OperationTimedoutException: Commands
>>>>> 1112277002 to Host 1 timed out after 3600
>>>>>
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>>          at
>>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>> :471)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>>          at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
>>>>> (StatsCollector-2:null) Unable to reach Pool[200|LVM]
>>>>>
>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>> [StoragePool:200] is unreachable: Unable to send command to the
>>>>> pool
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>>          at
>>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>> :471)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>>          at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> I have tried to force reconnect to both hosts but that ends up
>>>>> maxing
>>> out
>>>>> a CPU core and filling up the log file with endless log lines.
>>>>>
>>>>> Any thoughts on how to recover my system?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> Find out more about ShapeBlue and our range of CloudStack related services
>>
>> IaaS Cloud Design & Build<
>> http://shapeblue.com/iaas-cloud-design-and-build//>
>> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
>> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
>> CloudStack Infrastructure Support<
>> http://shapeblue.com/cloudstack-infrastructure-support/>
>> CloudStack Bootcamp Training Courses<
>> http://shapeblue.com/cloudstack-training/>
>>
>> This email and any attachments to it may be confidential and are intended
>> solely for the use of the individual to whom it is addressed. Any views or
>> opinions expressed are solely those of the author and do not necessarily
>> represent those of Shape Blue Ltd or related companies. If you are not the
>> intended recipient of this email, you must neither take any action based
>> upon its contents, nor copy or show it to anyone. Please contact the sender
>> if you believe you have received this email in error. Shape Blue Ltd is a
>> company incorporated in England & Wales. ShapeBlue Services India LLP is a
>> company incorporated in India and is operated under license from Shape Blue
>> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
>> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
>> a company registered by The Republic of South Africa and is traded under
>> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>>


Re: Management Server won't connect after cluster shutdown and restart

Posted by Ian Duffy <ia...@ianduffy.ie>.
Hi All,

Thank you very much for the help.

Ended up solving the issue. There was an invalid value in our configuration
table which seemed to prevent a lot of DAOs from being autowired.




On 29 August 2014 21:16, Paul Angus <pa...@shapeblue.com> wrote:

> Hi Ian,
>
> I've seen this kind of behaviour before with KVM hosts reconnecting.
>
> There’s a select …. WITH UPDATE; query on the op_ha_work table which locks
> the table, stopping other hosts updating their status. If there are a lot
> of entries in there they all lock each other out. Deleting the entries
> fixed the problem, but you have to deal with hosts and vms being up/down
> yourself.
>
> So check the op_ha_work table for lots of entries which can lock up the
> database. If you can check the database for the queries that it's handling
> - that would be best.
>
> Also check that the management server and MySQL DB is tuned for the load
> that being thrown at it.
> (http://support.citrix.com/article/CTX132020)
> Remember if you have other services such as Nagios or puppet/chef directly
> reading the DB, that adds to the number of connections into the mysql db -
> I have seen the management server starved of mysql connections when a lot
> of hosts are brought back online.
>
>
> Regards
>
> Paul Angus
> Cloud Architect
> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
> paul.angus@shapeblue.com
>
> -----Original Message-----
> From: creategui@gmail.com [mailto:creategui@gmail.com] On Behalf Of
> Carlos Reategui
> Sent: 29 August 2014 20:55
> To: users@cloudstack.apache.org
> Subject: Re: Management Server won't connect after cluster shutdown and
> restart
>
> Hi Ian,
>
> So the root of the problem was that the machines where not started up in
> the correct order.
>
> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
> VM hosts.  On the other end the hosts needed to be brought up first and
> once they are ok then bring up the CS machine and make sure everything was
> in the same state it thought things were when it was shutdown.
>  Unfortunately CS came up before everything else was the way it expected
> it to be and I did not realize that at the time.
>
> To resolve I went back to my CS db backup from right after I shut it down
> the MS, made sure the VM hosts were all as expected and then started the MS.
>
>
>
>
>
>
> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:
>
> > Hi carlos,
> >
> > Did you ever find a fix for this?
> >
> > I'm seeing a same issue on 4.1.1 with Vmware ESXi.
> >
> >
> > On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
> >
> > > Update.  I cleared out the async_job table and also reset the system
> > > vms
> > it
> > > thought where in starting mode from my previous attempts by setting
> > > them
> > to
> > > Stopped from starting.  I also re-set the XS pool master to be the
> > > one XS thinks it is.
> > >
> > > Now when I start the CS MS here are the logs leading up to the first
> > > exception about the Unable to reach the pool:
> > >
> > > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> > > (Cluster-Notification-1:null) Management server node 172.30.45.2 is
> > > up, send alert
> > >
> > > 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> > > (Cluster-Notification-1:null) Notifying management server join event
> > took 9
> > > ms
> > >
> > > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) HostStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-3:null) VmStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) StorageCollector is running...
> > >
> > > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) There is no secondary storage VM for
> > > secondary storage host nfs://172.30.45.2/store/secondary
> > >
> > > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
> > 233845174730253
> > >
> > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
> > Req:
> > > Resource [Host:1] is unreachable: Host 1: Link is c
> > >
> > > losed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,277 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Timed out on null
> > >
> > > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 201916421 to
> > > Host 1 timed out after 3600
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 1 statistics.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 1
> > >
> > > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
> > >
> > > 2013-10-28 21:27:23,283 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> > > Req: Resource [Host:1] is unreachable: Host 1: Link is
> > >
> > > closed
> > >
> > > 2013-10-28 21:27:23,284 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,286 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Timed out on null
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[200|LVM] via 1
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 201916422
> > > to
> > Host
> > > 1 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[200|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:200]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId
> 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,303 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
> > > Host
> > 2
> > > timed out after 3600
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 2 statistics.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 2
> > >
> > > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,310 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[201|LVM] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703497
> > > to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[201|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:201]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
> > >
> > > 2013-10-28 21:27:23,329 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,330 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,331 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[202|NetworkFilesystem] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703498
> > > to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > IP tables is disable on the XS hosts so the connection prob is not a
> > > firewall issue.
> > >
> > > If I do an xe se-list I see all 3 of the above SRs and the hosts
> > > have mounted the NFS SR and can access it.
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
> > > <carlos@reategui.com
> > > >wrote:
> > >
> > > > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> > > >
> > > > Had to shut everything down and now I am having problems bringing
> > things
> > > > up.
> > > >
> > > > As suggested I used CS to stop all my instances as well as the
> > > > system
> > VMs
> > > > and the SR. Then I shutdown the XS 6.02 servers after enabling
> > > maintenance
> > > > mode from the CS console.
> > > >
> > > > After bringing things up, my XS servers had the infamous
> > interface-rename
> > > > issue which I resolved by editing the udev rules file manually.
> > > >
> > > > Now I have my XS servers up but for some reason my pool master got
> > > changed
> > > > so I used xe pool-designate-new-master to switch it back.
> > > >
> > > > I did not notice that this designation change had been picked up
> > > > by CS
> > > and
> > > > when starting it up it keeps trying to connect to the wrong pool
> > master.
> > > >  Should I switch XS to match CS or what do I need to change in CS
> > > > to
> > tell
> > > > it what the pool master is?
> > > >
> > > > I tried putting the server that CS thinks is the master in
> > > > maintenance mode from CS but that just ends up in an apparent
> > > > infinite cycle
> > spitting
> > > > out endless lines like these:
> > > >
> > > > 2013-10-28 20:39:02,059 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,060 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,062 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,063 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,064 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,066 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,067 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,068 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > After stopping and restarting the MS, the first error I see is:
> > > >
> > > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> > &response=json&sessi
> > > >
> > > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) unknown exception writing api response
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> > a:280)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> > a:143)
> > > >
> > > >         at
> > > > com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> > > >
> > > >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> > > >
> > > >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> > > >
> > > >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:290)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:127)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > > >
> > > >         at
> > > >
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6
> > 15)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :293)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor
> > .java:889)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro
> > cess(Http11NioProtocol.java:744)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint
> > .java:2282)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> > &response=json&session
> > > >
> > > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > Then I see a few of these:
> > > >
> > > > 2013-10-28 20:42:01,464 WARN
> > > > [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at
> > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> > redAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> > he.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> > or.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> > erImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> > lityManagerImpl.java:829)
> > > >
> > > > 2013-10-28 20:42:01,468 WARN
> > > > [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at
> > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> > redAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> > he.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> > or.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> > erImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> > lityManagerImpl.java:829)
> > > >
> > > >
> > > > The next error is:
> > > >
> > > > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > > > (AgentManager-Handler-6:null) Caught the following exception but
> > pushing
> > > on
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes
> > .java:231)
> > > >
> > > >         at
> > > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
> > > > :150)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus
> > ionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti
> > onExclusionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle
> > ctingFieldNavigator.java:58)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> > java:56)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> > java:37)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer
> > (JsonSerializationVisitor.java:184)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS
> > erializationVisitor.java:160)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJson(Gson.java:260)
> > > >
> > > >         at
> > > > com.cloud.agent.transport.Request.toBytes(Request.java:316)
> > > >
> > > >         at
> > > > com.cloud.agent.transport.Request.getBytes(Request.java:332)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen
> > tManagerImpl.java:435)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle
> > r.doTask(ClusteredAgentManagerImpl.java:641)
> > > >
> > > >         at com.cloud.utils.nio.Task.run(Task.java:83)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > and then the next set of errors I see over and over are:
> > > >
> > > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > > > (StatsCollector-2:null) Unable to send storage pool command to
> > > > Pool[200|LVM] via 1
> > > >
> > > > com.cloud.exception.OperationTimedoutException: Commands
> > > > 1112277002 to Host 1 timed out after 3600
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > > :471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > > > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> > > >
> > > > com.cloud.exception.StorageUnavailableException: Resource
> > > > [StoragePool:200] is unreachable: Unable to send command to the
> > > > pool
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > > :471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > I have tried to force reconnect to both hosts but that ends up
> > > > maxing
> > out
> > > > a CPU core and filling up the log file with endless log lines.
> > > >
> > > > Any thoughts on how to recover my system?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> Find out more about ShapeBlue and our range of CloudStack related services
>
> IaaS Cloud Design & Build<
> http://shapeblue.com/iaas-cloud-design-and-build//>
> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
> CloudStack Infrastructure Support<
> http://shapeblue.com/cloudstack-infrastructure-support/>
> CloudStack Bootcamp Training Courses<
> http://shapeblue.com/cloudstack-training/>
>
> This email and any attachments to it may be confidential and are intended
> solely for the use of the individual to whom it is addressed. Any views or
> opinions expressed are solely those of the author and do not necessarily
> represent those of Shape Blue Ltd or related companies. If you are not the
> intended recipient of this email, you must neither take any action based
> upon its contents, nor copy or show it to anyone. Please contact the sender
> if you believe you have received this email in error. Shape Blue Ltd is a
> company incorporated in England & Wales. ShapeBlue Services India LLP is a
> company incorporated in India and is operated under license from Shape Blue
> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
> a company registered by The Republic of South Africa and is traded under
> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>

RE: Management Server won't connect after cluster shutdown and restart

Posted by Paul Angus <pa...@shapeblue.com>.
Hi Ian,

I've seen this kind of behaviour before with KVM hosts reconnecting.

There’s a select …. WITH UPDATE; query on the op_ha_work table which locks the table, stopping other hosts updating their status. If there are a lot of entries in there they all lock each other out. Deleting the entries fixed the problem, but you have to deal with hosts and vms being up/down yourself.

So check the op_ha_work table for lots of entries which can lock up the database. If you can check the database for the queries that it's handling - that would be best.

Also check that the management server and MySQL DB is tuned for the load that being thrown at it.
(http://support.citrix.com/article/CTX132020)
Remember if you have other services such as Nagios or puppet/chef directly reading the DB, that adds to the number of connections into the mysql db - I have seen the management server starved of mysql connections when a lot of hosts are brought back online.


Regards

Paul Angus
Cloud Architect
S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
paul.angus@shapeblue.com

-----Original Message-----
From: creategui@gmail.com [mailto:creategui@gmail.com] On Behalf Of Carlos Reategui
Sent: 29 August 2014 20:55
To: users@cloudstack.apache.org
Subject: Re: Management Server won't connect after cluster shutdown and restart

Hi Ian,

So the root of the problem was that the machines where not started up in the correct order.

My plan had been to stop all VMs from CS, then stop CS, then shutdown the VM hosts.  On the other end the hosts needed to be brought up first and once they are ok then bring up the CS machine and make sure everything was in the same state it thought things were when it was shutdown.
 Unfortunately CS came up before everything else was the way it expected it to be and I did not realize that at the time.

To resolve I went back to my CS db backup from right after I shut it down the MS, made sure the VM hosts were all as expected and then started the MS.






On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:

> Hi carlos,
>
> Did you ever find a fix for this?
>
> I'm seeing a same issue on 4.1.1 with Vmware ESXi.
>
>
> On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
>
> > Update.  I cleared out the async_job table and also reset the system
> > vms
> it
> > thought where in starting mode from my previous attempts by setting
> > them
> to
> > Stopped from starting.  I also re-set the XS pool master to be the
> > one XS thinks it is.
> >
> > Now when I start the CS MS here are the logs leading up to the first
> > exception about the Unable to reach the pool:
> >
> > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> > (Cluster-Notification-1:null) Management server node 172.30.45.2 is
> > up, send alert
> >
> > 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> > (Cluster-Notification-1:null) Notifying management server join event
> took 9
> > ms
> >
> > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-2:null) HostStatsCollector is running...
> >
> > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-3:null) VmStatsCollector is running...
> >
> > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-1:null) StorageCollector is running...
> >
> > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-1:null) There is no secondary storage VM for
> > secondary storage host nfs://172.30.45.2/store/secondary
> >
> > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
> 233845174730255
> >
> > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
> 233845174730253
> >
> > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
> >
> > 2013-10-28 21:27:23,275 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
> Req:
> > Resource [Host:1] is unreachable: Host 1: Link is c
> >
> > losed
> >
> > 2013-10-28 21:27:23,275 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,277 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-2:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list
> > for exceptions
> >
> > 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Timed out on null
> >
> > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Cancelling.
> >
> > 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> > (StatsCollector-2:null) Operation timed out: Commands 201916421 to
> > Host 1 timed out after 3600
> >
> > 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> > (StatsCollector-2:null) Unable to obtain host 1 statistics.
> >
> > 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> > (StatsCollector-2:null) Received invalid host stats for host: 1
> >
> > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
> 233845174730255
> >
> > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
> >
> > 2013-10-28 21:27:23,283 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> > Req: Resource [Host:1] is unreachable: Host 1: Link is
> >
> > closed
> >
> > 2013-10-28 21:27:23,284 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,286 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list
> > for exceptions
> >
> > 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Timed out on null
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Cancelling.
> >
> > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[200|LVM] via 1
> >
> > com.cloud.exception.OperationTimedoutException: Commands 201916422
> > to
> Host
> > 1 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> > (StatsCollector-1:null) Unable to reach Pool[200|LVM]
> >
> > com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:200]
> > is unreachable: Unable to send command to the pool
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2357)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
> >
> > 2013-10-28 21:27:23,302 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,302 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,303 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
> >
> > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-2:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list
> > for exceptions
> >
> > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
> >
> > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
> >
> > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
> > Host
> 2
> > timed out after 3600
> >
> > 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> > (StatsCollector-2:null) Unable to obtain host 2 statistics.
> >
> > 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> > (StatsCollector-2:null) Received invalid host stats for host: 2
> >
> > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
> >
> > 2013-10-28 21:27:23,308 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,308 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,310 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
> Cancel
> > request received
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list
> > for exceptions
> >
> > 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
> >
> > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[201|LVM] via 2
> >
> > com.cloud.exception.OperationTimedoutException: Commands 1168703497
> > to
> Host
> > 2 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> > (StatsCollector-1:null) Unable to reach Pool[201|LVM]
> >
> > com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:201]
> > is unreachable: Unable to send command to the pool
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2357)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
> >
> > 2013-10-28 21:27:23,329 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,330 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,331 DEBUG
> > [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
> Cancel
> > request received
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > because this is the current command
> >
> > 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list
> > for exceptions
> >
> > 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
> >
> > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
> >
> > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[202|NetworkFilesystem] via 2
> >
> > com.cloud.exception.OperationTimedoutException: Commands 1168703498
> > to
> Host
> > 2 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > 464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > IP tables is disable on the XS hosts so the connection prob is not a
> > firewall issue.
> >
> > If I do an xe se-list I see all 3 of the above SRs and the hosts
> > have mounted the NFS SR and can access it.
> >
> >
> >
> >
> > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
> > <carlos@reategui.com
> > >wrote:
> >
> > > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> > >
> > > Had to shut everything down and now I am having problems bringing
> things
> > > up.
> > >
> > > As suggested I used CS to stop all my instances as well as the
> > > system
> VMs
> > > and the SR. Then I shutdown the XS 6.02 servers after enabling
> > maintenance
> > > mode from the CS console.
> > >
> > > After bringing things up, my XS servers had the infamous
> interface-rename
> > > issue which I resolved by editing the udev rules file manually.
> > >
> > > Now I have my XS servers up but for some reason my pool master got
> > changed
> > > so I used xe pool-designate-new-master to switch it back.
> > >
> > > I did not notice that this designation change had been picked up
> > > by CS
> > and
> > > when starting it up it keeps trying to connect to the wrong pool
> master.
> > >  Should I switch XS to match CS or what do I need to change in CS
> > > to
> tell
> > > it what the pool master is?
> > >
> > > I tried putting the server that CS thinks is the master in
> > > maintenance mode from CS but that just ends up in an apparent
> > > infinite cycle
> spitting
> > > out endless lines like these:
> > >
> > > 2013-10-28 20:39:02,059 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,060 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,062 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,063 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,064 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,066 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,067 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,068 DEBUG
> > > [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > After stopping and restarting the MS, the first error I see is:
> > >
> > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> > >
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> &response=json&sessi
> > >
> > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > >
> > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) unknown exception writing api response
> > >
> > > java.lang.NullPointerException
> > >
> > >         at
> > >
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> a:280)
> > >
> > >         at
> > >
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> a:143)
> > >
> > >         at
> > > com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> > >
> > >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> > >
> > >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> > >
> > >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:290)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:233)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:191)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:127)
> > >
> > >         at
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:102)
> > >
> > >         at
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6
> 15)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:109)
> > >
> > >         at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :293)
> > >
> > >         at
> > >
> >
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor
> .java:889)
> > >
> > >         at
> > >
> >
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro
> cess(Http11NioProtocol.java:744)
> > >
> > >         at
> > >
> >
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint
> .java:2282)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> > >
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> &response=json&session
> > >
> > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > >
> > > Then I see a few of these:
> > >
> > > 2013-10-28 20:42:01,464 WARN
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > >
> > > java.net.ConnectException: Connection refused
> > >
> > >         at sun.nio.ch.Net.connect(Native Method)
> > >
> > >         at
> > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > >
> > >         at
> > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> redAgentManagerImpl.java:477)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> he.java:172)
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> 1)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> 4)
> > >
> > >         at
> > >
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> or.java:53)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> erImpl.java:434)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> lityManagerImpl.java:829)
> > >
> > > 2013-10-28 20:42:01,468 WARN
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > >
> > > java.net.ConnectException: Connection refused
> > >
> > >         at sun.nio.ch.Net.connect(Native Method)
> > >
> > >         at
> > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > >
> > >         at
> > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> redAgentManagerImpl.java:477)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> he.java:172)
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> 1)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> 4)
> > >
> > >         at
> > >
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> or.java:53)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> erImpl.java:434)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> lityManagerImpl.java:829)
> > >
> > >
> > > The next error is:
> > >
> > > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > > (AgentManager-Handler-6:null) Caught the following exception but
> pushing
> > on
> > >
> > > java.lang.NullPointerException
> > >
> > >         at
> > >
> >
> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes
> .java:231)
> > >
> > >         at
> > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
> > > :150)
> > >
> > >         at
> > >
> >
> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus
> ionStrategy.java:38)
> > >
> > >         at
> > >
> >
> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti
> onExclusionStrategy.java:38)
> > >
> > >         at
> > >
> >
> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle
> ctingFieldNavigator.java:58)
> > >
> > >         at
> > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> ationContextDefault.java:62)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> ationContextDefault.java:53)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> > >
> > >         at
> > >
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> java:56)
> > >
> > >         at
> > >
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> java:37)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer
> (JsonSerializationVisitor.java:184)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS
> erializationVisitor.java:160)
> > >
> > >         at
> > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> ationContextDefault.java:62)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> ationContextDefault.java:53)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > >
> > >         at com.google.gson.Gson.toJson(Gson.java:260)
> > >
> > >         at
> > > com.cloud.agent.transport.Request.toBytes(Request.java:316)
> > >
> > >         at
> > > com.cloud.agent.transport.Request.getBytes(Request.java:332)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen
> tManagerImpl.java:435)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle
> r.doTask(ClusteredAgentManagerImpl.java:641)
> > >
> > >         at com.cloud.utils.nio.Task.run(Task.java:83)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > and then the next set of errors I see over and over are:
> > >
> > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-2:null) Unable to send storage pool command to
> > > Pool[200|LVM] via 1
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands
> > > 1112277002 to Host 1 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> 1)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> 4)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2347)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> > >
> > >         at
> > >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > :471)
> > >
> > >         at
> > >
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > > [StoragePool:200] is unreachable: Unable to send command to the
> > > pool
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:2357)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:422)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> a:436)
> > >
> > >         at
> > >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > :471)
> > >
> > >         at
> > >
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > I have tried to force reconnect to both hosts but that ends up
> > > maxing
> out
> > > a CPU core and filling up the log file with endless log lines.
> > >
> > > Any thoughts on how to recover my system?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.

RE: Management Server won't connect after cluster shutdown and restart

Posted by Michael Phillips <mp...@hotmail.com>.
I posted an email yesterday stating how I shutdown\restart my CS instances. Works 100%

> Date: Fri, 29 Aug 2014 12:54:38 -0700
> Subject: Re: Management Server won't connect after cluster shutdown and restart
> From: carlos@reategui.com
> To: users@cloudstack.apache.org
> 
> Hi Ian,
> 
> So the root of the problem was that the machines where not started up in
> the correct order.
> 
> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
> VM hosts.  On the other end the hosts needed to be brought up first and
> once they are ok then bring up the CS machine and make sure everything was
> in the same state it thought things were when it was shutdown.
>  Unfortunately CS came up before everything else was the way it expected it
> to be and I did not realize that at the time.
> 
> To resolve I went back to my CS db backup from right after I shut it down
> the MS, made sure the VM hosts were all as expected and then started the
> MS.
> 
> 
> 
> 
> 
> 
> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:
> 
> > Hi carlos,
> >
> > Did you ever find a fix for this?
> >
> > I'm seeing a same issue on 4.1.1 with Vmware ESXi.
> >
> >
> > On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
> >
> > > Update.  I cleared out the async_job table and also reset the system vms
> > it
> > > thought where in starting mode from my previous attempts by setting them
> > to
> > > Stopped from starting.  I also re-set the XS pool master to be the one XS
> > > thinks it is.
> > >
> > > Now when I start the CS MS here are the logs leading up to the first
> > > exception about the Unable to reach the pool:
> > >
> > > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> > > (Cluster-Notification-1:null) Management server node 172.30.45.2 is up,
> > > send alert
> > >
> > > 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> > > (Cluster-Notification-1:null) Notifying management server join event
> > took 9
> > > ms
> > >
> > > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) HostStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-3:null) VmStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) StorageCollector is running...
> > >
> > > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) There is no secondary storage VM for secondary
> > > storage host nfs://172.30.45.2/store/secondary
> > >
> > > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
> > 233845174730253
> > >
> > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
> > Req:
> > > Resource [Host:1] is unreachable: Host 1: Link is c
> > >
> > > losed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > >
> > > 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Timed out on null
> > >
> > > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 201916421 to Host 1
> > > timed out after 3600
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 1 statistics.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 1
> > >
> > > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> > > Req: Resource [Host:1] is unreachable: Host 1: Link is
> > >
> > > closed
> > >
> > > 2013-10-28 21:27:23,284 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > >
> > > 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Timed out on null
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[200|LVM] via 1
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 201916422 to
> > Host
> > > 1 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[200|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:200]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to Host
> > 2
> > > timed out after 3600
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 2 statistics.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 2
> > >
> > > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > >
> > > 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[201|LVM] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703497 to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[201|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:201]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,330 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> > > this is the current command
> > >
> > > 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > >
> > > 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[202|NetworkFilesystem] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703498 to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > IP tables is disable on the XS hosts so the connection prob is not a
> > > firewall issue.
> > >
> > > If I do an xe se-list I see all 3 of the above SRs and the hosts have
> > > mounted the NFS SR and can access it.
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui <carlos@reategui.com
> > > >wrote:
> > >
> > > > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> > > >
> > > > Had to shut everything down and now I am having problems bringing
> > things
> > > > up.
> > > >
> > > > As suggested I used CS to stop all my instances as well as the system
> > VMs
> > > > and the SR. Then I shutdown the XS 6.02 servers after enabling
> > > maintenance
> > > > mode from the CS console.
> > > >
> > > > After bringing things up, my XS servers had the infamous
> > interface-rename
> > > > issue which I resolved by editing the udev rules file manually.
> > > >
> > > > Now I have my XS servers up but for some reason my pool master got
> > > changed
> > > > so I used xe pool-designate-new-master to switch it back.
> > > >
> > > > I did not notice that this designation change had been picked up by CS
> > > and
> > > > when starting it up it keeps trying to connect to the wrong pool
> > master.
> > > >  Should I switch XS to match CS or what do I need to change in CS to
> > tell
> > > > it what the pool master is?
> > > >
> > > > I tried putting the server that CS thinks is the master in maintenance
> > > > mode from CS but that just ends up in an apparent infinite cycle
> > spitting
> > > > out endless lines like these:
> > > >
> > > > 2013-10-28 20:39:02,059 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,060 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,062 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,063 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,064 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,066 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,067 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,068 DEBUG [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > After stopping and restarting the MS, the first error I see is:
> > > >
> > > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&sessi
> > > >
> > > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) unknown exception writing api response
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:280)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:143)
> > > >
> > > >         at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> > > >
> > > >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> > > >
> > > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> > > >
> > > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > > >
> > > >         at
> > > >
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:615)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&session
> > > >
> > > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > Then I see a few of these:
> > > >
> > > > 2013-10-28 20:42:01,464 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> > > >
> > > > 2013-10-28 20:42:01,468 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> > > >
> > > >
> > > > The next error is:
> > > >
> > > > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > > > (AgentManager-Handler-6:null) Caught the following exception but
> > pushing
> > > on
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes.java:231)
> > > >
> > > >         at
> > > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java:150)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclusionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(DisjunctionExclusionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:58)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:56)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:37)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer(JsonSerializationVisitor.java:184)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:160)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJson(Gson.java:260)
> > > >
> > > >         at com.cloud.agent.transport.Request.toBytes(Request.java:316)
> > > >
> > > >         at com.cloud.agent.transport.Request.getBytes(Request.java:332)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgentManagerImpl.java:435)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandler.doTask(ClusteredAgentManagerImpl.java:641)
> > > >
> > > >         at com.cloud.utils.nio.Task.run(Task.java:83)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > and then the next set of errors I see over and over are:
> > > >
> > > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > > > (StatsCollector-2:null) Unable to send storage pool command to
> > > > Pool[200|LVM] via 1
> > > >
> > > > com.cloud.exception.OperationTimedoutException: Commands 1112277002 to
> > > > Host 1 timed out after 3600
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > > > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> > > >
> > > > com.cloud.exception.StorageUnavailableException: Resource
> > > > [StoragePool:200] is unreachable: Unable to send command to the pool
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > I have tried to force reconnect to both hosts but that ends up maxing
> > out
> > > > a CPU core and filling up the log file with endless log lines.
> > > >
> > > > Any thoughts on how to recover my system?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
 		 	   		  

Re: Management Server won't connect after cluster shutdown and restart

Posted by Carlos Reategui <ca...@reategui.com>.
Hi Ian,

So the root of the problem was that the machines where not started up in
the correct order.

My plan had been to stop all VMs from CS, then stop CS, then shutdown the
VM hosts.  On the other end the hosts needed to be brought up first and
once they are ok then bring up the CS machine and make sure everything was
in the same state it thought things were when it was shutdown.
 Unfortunately CS came up before everything else was the way it expected it
to be and I did not realize that at the time.

To resolve I went back to my CS db backup from right after I shut it down
the MS, made sure the VM hosts were all as expected and then started the
MS.






On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ia...@ianduffy.ie> wrote:

> Hi carlos,
>
> Did you ever find a fix for this?
>
> I'm seeing a same issue on 4.1.1 with Vmware ESXi.
>
>
> On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:
>
> > Update.  I cleared out the async_job table and also reset the system vms
> it
> > thought where in starting mode from my previous attempts by setting them
> to
> > Stopped from starting.  I also re-set the XS pool master to be the one XS
> > thinks it is.
> >
> > Now when I start the CS MS here are the logs leading up to the first
> > exception about the Unable to reach the pool:
> >
> > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> > (Cluster-Notification-1:null) Management server node 172.30.45.2 is up,
> > send alert
> >
> > 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> > (Cluster-Notification-1:null) Notifying management server join event
> took 9
> > ms
> >
> > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-2:null) HostStatsCollector is running...
> >
> > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-3:null) VmStatsCollector is running...
> >
> > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-1:null) StorageCollector is running...
> >
> > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> > (StatsCollector-1:null) There is no secondary storage VM for secondary
> > storage host nfs://172.30.45.2/store/secondary
> >
> > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
> 233845174730255
> >
> > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
> 233845174730253
> >
> > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
> >
> > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
> Req:
> > Resource [Host:1] is unreachable: Host 1: Link is c
> >
> > losed
> >
> > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-2:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> >
> > 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Timed out on null
> >
> > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 1-201916421: Cancelling.
> >
> > 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> > (StatsCollector-2:null) Operation timed out: Commands 201916421 to Host 1
> > timed out after 3600
> >
> > 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> > (StatsCollector-2:null) Unable to obtain host 1 statistics.
> >
> > 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> > (StatsCollector-2:null) Received invalid host stats for host: 1
> >
> > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
> 233845174730255
> >
> > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
> >
> > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> > Req: Resource [Host:1] is unreachable: Host 1: Link is
> >
> > closed
> >
> > 2013-10-28 21:27:23,284 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> >
> > 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Timed out on null
> >
> > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 1-201916422: Cancelling.
> >
> > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[200|LVM] via 1
> >
> > com.cloud.exception.OperationTimedoutException: Commands 201916422 to
> Host
> > 1 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> > (StatsCollector-1:null) Unable to reach Pool[200|LVM]
> >
> > com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:200]
> > is unreachable: Unable to send command to the pool
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
> >
> > 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,303 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel request received
> >
> > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
> >
> > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-2:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> >
> > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
> >
> > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
> >
> > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to Host
> 2
> > timed out after 3600
> >
> > 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> > (StatsCollector-2:null) Unable to obtain host 2 statistics.
> >
> > 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> > (StatsCollector-2:null) Received invalid host stats for host: 2
> >
> > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
> Cancel
> > request received
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> >
> > 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
> >
> > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
> >
> > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[201|LVM] via 2
> >
> > com.cloud.exception.OperationTimedoutException: Commands 1168703497 to
> Host
> > 2 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> > (StatsCollector-1:null) Unable to reach Pool[201|LVM]
> >
> > com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:201]
> > is unreachable: Unable to send command to the pool
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> > 233845174730255
> >
> > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> > 233845174730253
> >
> > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
> >
> > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> >
> > 2013-10-28 21:27:23,330 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Routing to peer
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
> Cancel
> > request received
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> > this is the current command
> >
> > 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> > (StatsCollector-1:null) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> >
> > 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
> >
> > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> > (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
> >
> > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-1:null) Unable to send storage pool command to
> > Pool[202|NetworkFilesystem] via 2
> >
> > com.cloud.exception.OperationTimedoutException: Commands 1168703498 to
> Host
> > 2 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > IP tables is disable on the XS hosts so the connection prob is not a
> > firewall issue.
> >
> > If I do an xe se-list I see all 3 of the above SRs and the hosts have
> > mounted the NFS SR and can access it.
> >
> >
> >
> >
> > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui <carlos@reategui.com
> > >wrote:
> >
> > > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> > >
> > > Had to shut everything down and now I am having problems bringing
> things
> > > up.
> > >
> > > As suggested I used CS to stop all my instances as well as the system
> VMs
> > > and the SR. Then I shutdown the XS 6.02 servers after enabling
> > maintenance
> > > mode from the CS console.
> > >
> > > After bringing things up, my XS servers had the infamous
> interface-rename
> > > issue which I resolved by editing the udev rules file manually.
> > >
> > > Now I have my XS servers up but for some reason my pool master got
> > changed
> > > so I used xe pool-designate-new-master to switch it back.
> > >
> > > I did not notice that this designation change had been picked up by CS
> > and
> > > when starting it up it keeps trying to connect to the wrong pool
> master.
> > >  Should I switch XS to match CS or what do I need to change in CS to
> tell
> > > it what the pool master is?
> > >
> > > I tried putting the server that CS thinks is the master in maintenance
> > > mode from CS but that just ends up in an apparent infinite cycle
> spitting
> > > out endless lines like these:
> > >
> > > 2013-10-28 20:39:02,059 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,060 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,062 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,063 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,064 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,066 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,067 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:
> > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > >
> > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > 2013-10-28 20:39:02,068 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> > >
> > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > >
> > > After stopping and restarting the MS, the first error I see is:
> > >
> > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> > >
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&sessi
> > >
> > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > >
> > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) unknown exception writing api response
> > >
> > > java.lang.NullPointerException
> > >
> > >         at
> > >
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:280)
> > >
> > >         at
> > >
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:143)
> > >
> > >         at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> > >
> > >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> > >
> > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> > >
> > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > >
> > >         at
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > >
> > >         at
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:615)
> > >
> > >         at
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > >
> > >         at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> > >
> > >         at
> > >
> >
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> > >
> > >         at
> > >
> >
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> > >
> > >         at
> > >
> >
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> > >
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&session
> > >
> > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > >
> > > Then I see a few of these:
> > >
> > > 2013-10-28 20:42:01,464 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > >
> > > java.net.ConnectException: Connection refused
> > >
> > >         at sun.nio.ch.Net.connect(Native Method)
> > >
> > >         at
> > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > >
> > >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> > >
> > > 2013-10-28 20:42:01,468 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > >
> > > java.net.ConnectException: Connection refused
> > >
> > >         at sun.nio.ch.Net.connect(Native Method)
> > >
> > >         at
> > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > >
> > >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> > >
> > >         at
> > >
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> > >
> > >
> > > The next error is:
> > >
> > > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > > (AgentManager-Handler-6:null) Caught the following exception but
> pushing
> > on
> > >
> > > java.lang.NullPointerException
> > >
> > >         at
> > >
> >
> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes.java:231)
> > >
> > >         at
> > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java:150)
> > >
> > >         at
> > >
> >
> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclusionStrategy.java:38)
> > >
> > >         at
> > >
> >
> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(DisjunctionExclusionStrategy.java:38)
> > >
> > >         at
> > >
> >
> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:58)
> > >
> > >         at
> > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> > >
> > >         at
> > >
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:56)
> > >
> > >         at
> > >
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:37)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer(JsonSerializationVisitor.java:184)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:160)
> > >
> > >         at
> > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> > >
> > >         at
> > >
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> > >
> > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > >
> > >         at com.google.gson.Gson.toJson(Gson.java:260)
> > >
> > >         at com.cloud.agent.transport.Request.toBytes(Request.java:316)
> > >
> > >         at com.cloud.agent.transport.Request.getBytes(Request.java:332)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgentManagerImpl.java:435)
> > >
> > >         at
> > >
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandler.doTask(ClusteredAgentManagerImpl.java:641)
> > >
> > >         at com.cloud.utils.nio.Task.run(Task.java:83)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > and then the next set of errors I see over and over are:
> > >
> > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-2:null) Unable to send storage pool command to
> > > Pool[200|LVM] via 1
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1112277002 to
> > > Host 1 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> > >
> > >         at
> > >
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > > [StoragePool:200] is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> > >
> > >         at
> > >
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> > >
> > >         at
> > >
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >
> > >         at
> > >
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >
> > >         at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > I have tried to force reconnect to both hosts but that ends up maxing
> out
> > > a CPU core and filling up the log file with endless log lines.
> > >
> > > Any thoughts on how to recover my system?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Management Server won't connect after cluster shutdown and restart

Posted by Ian Duffy <ia...@ianduffy.ie>.
Hi carlos,

Did you ever find a fix for this?

I'm seeing a same issue on 4.1.1 with Vmware ESXi.


On 29 October 2013 04:54, Carlos Reategui <cr...@gmail.com> wrote:

> Update.  I cleared out the async_job table and also reset the system vms it
> thought where in starting mode from my previous attempts by setting them to
> Stopped from starting.  I also re-set the XS pool master to be the one XS
> thinks it is.
>
> Now when I start the CS MS here are the logs leading up to the first
> exception about the Unable to reach the pool:
>
> 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> (Cluster-Notification-1:null) Management server node 172.30.45.2 is up,
> send alert
>
> 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Notification-1:null) Notifying management server join event took 9
> ms
>
> 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-2:null) HostStatsCollector is running...
>
> 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-3:null) VmStatsCollector is running...
>
> 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-1:null) StorageCollector is running...
>
> 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-1:null) There is no secondary storage VM for secondary
> storage host nfs://172.30.45.2/store/secondary
>
> 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-2:null) Seq 1-201916421: Forwarding null to 233845174730255
>
> 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-9:null) Seq 1-201916421: Routing from 233845174730253
>
> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
>
> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253: Req:
> Resource [Host:1] is unreachable: Host 1: Link is c
>
> losed
>
> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> Routing to peer
>
> 2013-10-28 21:27:23,277 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> Cancel request received
>
> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
>
> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> (StatsCollector-2:null) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
>
> 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 1-201916421: Timed out on null
>
> 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 1-201916421: Cancelling.
>
> 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> (StatsCollector-2:null) Operation timed out: Commands 201916421 to Host 1
> timed out after 3600
>
> 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> (StatsCollector-2:null) Unable to obtain host 1 statistics.
>
> 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> (StatsCollector-2:null) Received invalid host stats for host: 1
>
> 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-1:null) Seq 1-201916422: Forwarding null to 233845174730255
>
> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> 233845174730253
>
> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
>
> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> Req: Resource [Host:1] is unreachable: Host 1: Link is
>
> closed
>
> 2013-10-28 21:27:23,284 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> Routing to peer
>
> 2013-10-28 21:27:23,286 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> Cancel request received
>
> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
>
> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> (StatsCollector-1:null) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
>
> 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 1-201916422: Timed out on null
>
> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 1-201916422: Cancelling.
>
> 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> (StatsCollector-1:null) Unable to send storage pool command to
> Pool[200|LVM] via 1
>
> com.cloud.exception.OperationTimedoutException: Commands 201916422 to Host
> 1 timed out after 3600
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
>
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> (StatsCollector-1:null) Unable to reach Pool[200|LVM]
>
> com.cloud.exception.StorageUnavailableException: Resource [StoragePool:200]
> is unreachable: Unable to send command to the pool
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
>
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> 233845174730255
>
> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> 233845174730253
>
> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
>
> 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId 233845174730253:
> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>
> 2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> Routing to peer
>
> 2013-10-28 21:27:23,303 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> Cancel request received
>
> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
>
> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> (StatsCollector-2:null) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
>
> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
>
> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
>
> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> (StatsCollector-2:null) Operation timed out: Commands 1168703496 to Host 2
> timed out after 3600
>
> 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> (StatsCollector-2:null) Unable to obtain host 2 statistics.
>
> 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> (StatsCollector-2:null) Received invalid host stats for host: 2
>
> 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> 233845174730255
>
> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> 233845174730253
>
> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
>
> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>
> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> Routing to peer
>
> 2013-10-28 21:27:23,310 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req: Cancel
> request received
>
> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
>
> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> (StatsCollector-1:null) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
>
> 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
>
> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
>
> 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> (StatsCollector-1:null) Unable to send storage pool command to
> Pool[201|LVM] via 2
>
> com.cloud.exception.OperationTimedoutException: Commands 1168703497 to Host
> 2 timed out after 3600
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
>
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> (StatsCollector-1:null) Unable to reach Pool[201|LVM]
>
> com.cloud.exception.StorageUnavailableException: Resource [StoragePool:201]
> is unreachable: Unable to send command to the pool
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
>
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> 233845174730255
>
> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> 233845174730253
>
> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
>
> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>
> 2013-10-28 21:27:23,330 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> Routing to peer
>
> 2013-10-28 21:27:23,331 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req: Cancel
> request received
>
> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
>
> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
> this is the current command
>
> 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> (StatsCollector-1:null) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
>
> 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
>
> 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
>
> 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> (StatsCollector-1:null) Unable to send storage pool command to
> Pool[202|NetworkFilesystem] via 2
>
> com.cloud.exception.OperationTimedoutException: Commands 1168703498 to Host
> 2 timed out after 3600
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
>
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
>
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> IP tables is disable on the XS hosts so the connection prob is not a
> firewall issue.
>
> If I do an xe se-list I see all 3 of the above SRs and the hosts have
> mounted the NFS SR and can access it.
>
>
>
>
> On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui <carlos@reategui.com
> >wrote:
>
> > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> >
> > Had to shut everything down and now I am having problems bringing things
> > up.
> >
> > As suggested I used CS to stop all my instances as well as the system VMs
> > and the SR. Then I shutdown the XS 6.02 servers after enabling
> maintenance
> > mode from the CS console.
> >
> > After bringing things up, my XS servers had the infamous interface-rename
> > issue which I resolved by editing the udev rules file manually.
> >
> > Now I have my XS servers up but for some reason my pool master got
> changed
> > so I used xe pool-designate-new-master to switch it back.
> >
> > I did not notice that this designation change had been picked up by CS
> and
> > when starting it up it keeps trying to connect to the wrong pool master.
> >  Should I switch XS to match CS or what do I need to change in CS to tell
> > it what the pool master is?
> >
> > I tried putting the server that CS thinks is the master in maintenance
> > mode from CS but that just ends up in an apparent infinite cycle spitting
> > out endless lines like these:
> >
> > 2013-10-28 20:39:02,059 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:
> > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> >
> > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,060 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> >
> > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,062 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> >
> > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,063 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:
> > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> >
> > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,064 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> >
> > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,066 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> >
> > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,067 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:
> > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> >
> > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > 2013-10-28 20:39:02,068 DEBUG [agent.manager.ClusteredAgentAttache]
> > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
> >
> > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> >
> > After stopping and restarting the MS, the first error I see is:
> >
> > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&sessi
> >
> > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> >
> > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > (catalina-exec-1:null) unknown exception writing api response
> >
> > java.lang.NullPointerException
> >
> >         at
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:280)
> >
> >         at
> >
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:143)
> >
> >         at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> >
> >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> >
> >         at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> >
> >         at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >
> >         at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >
> >         at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >
> >         at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >
> >         at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >
> >         at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:615)
> >
> >         at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >
> >         at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> >
> >         at
> >
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> >
> >         at
> >
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> >
> >         at
> >
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> >
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&session
> >
> > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> >
> > Then I see a few of these:
> >
> > 2013-10-28 20:42:01,464 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > 233845174730255, ip: 172.30.45.2 due to Connection refused
> >
> > java.net.ConnectException: Connection refused
> >
> >         at sun.nio.ch.Net.connect(Native Method)
> >
> >         at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> >
> >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> >
> >         at
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> >
> >         at
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> >
> > 2013-10-28 20:42:01,468 WARN  [agent.manager.ClusteredAgentManagerImpl]
> > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > 233845174730255, ip: 172.30.45.2 due to Connection refused
> >
> > java.net.ConnectException: Connection refused
> >
> >         at sun.nio.ch.Net.connect(Native Method)
> >
> >         at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> >
> >         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
> >
> >         at
> >
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
> >
> >         at
> >
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
> >
> >
> > The next error is:
> >
> > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > (AgentManager-Handler-6:null) Caught the following exception but pushing
> on
> >
> > java.lang.NullPointerException
> >
> >         at
> >
> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes.java:231)
> >
> >         at
> > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java:150)
> >
> >         at
> >
> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclusionStrategy.java:38)
> >
> >         at
> >
> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(DisjunctionExclusionStrategy.java:38)
> >
> >         at
> >
> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:58)
> >
> >         at
> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> >
> >         at
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> >
> >         at
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> >
> >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> >
> >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> >
> >         at
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:56)
> >
> >         at
> >
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:37)
> >
> >         at
> >
> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer(JsonSerializationVisitor.java:184)
> >
> >         at
> >
> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:160)
> >
> >         at
> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> >
> >         at
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
> >
> >         at
> >
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
> >
> >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> >
> >         at com.google.gson.Gson.toJson(Gson.java:260)
> >
> >         at com.cloud.agent.transport.Request.toBytes(Request.java:316)
> >
> >         at com.cloud.agent.transport.Request.getBytes(Request.java:332)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgentManagerImpl.java:435)
> >
> >         at
> >
> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandler.doTask(ClusteredAgentManagerImpl.java:641)
> >
> >         at com.cloud.utils.nio.Task.run(Task.java:83)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > and then the next set of errors I see over and over are:
> >
> > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > (StatsCollector-2:null) Unable to send storage pool command to
> > Pool[200|LVM] via 1
> >
> > com.cloud.exception.OperationTimedoutException: Commands 1112277002 to
> > Host 1 timed out after 3600
> >
> >         at
> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
> >
> >         at
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> >
> > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:200] is unreachable: Unable to send command to the pool
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
> >
> >         at
> >
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
> >
> >         at
> >
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:679)
> >
> > I have tried to force reconnect to both hosts but that ends up maxing out
> > a CPU core and filling up the log file with endless log lines.
> >
> > Any thoughts on how to recover my system?
> >
> >
> >
> >
> >
> >
> >
>

Re: Management Server won't connect after cluster shutdown and restart

Posted by Carlos Reategui <cr...@gmail.com>.
Update.  I cleared out the async_job table and also reset the system vms it
thought where in starting mode from my previous attempts by setting them to
Stopped from starting.  I also re-set the XS pool master to be the one XS
thinks it is.

Now when I start the CS MS here are the logs leading up to the first
exception about the Unable to reach the pool:

2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
(Cluster-Notification-1:null) Management server node 172.30.45.2 is up,
send alert

2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
(Cluster-Notification-1:null) Notifying management server join event took 9
ms

2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
(StatsCollector-2:null) HostStatsCollector is running...

2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
(StatsCollector-3:null) VmStatsCollector is running...

2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
(StatsCollector-1:null) StorageCollector is running...

2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
(StatsCollector-1:null) There is no secondary storage VM for secondary
storage host nfs://172.30.45.2/store/secondary

2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-2:null) Seq 1-201916421: Forwarding null to 233845174730255

2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-9:null) Seq 1-201916421: Routing from 233845174730253

2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-9:null) Seq 1-201916421: Link is closed

2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253: Req:
Resource [Host:1] is unreachable: Host 1: Link is c

losed

2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
Routing to peer

2013-10-28 21:27:23,277 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
Cancel request received

2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.

2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
this is the current command

2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 1-201916421: Waiting some more time because
this is the current command

2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
(StatsCollector-2:null) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions

2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 1-201916421: Timed out on null

2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 1-201916421: Cancelling.

2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
(StatsCollector-2:null) Operation timed out: Commands 201916421 to Host 1
timed out after 3600

2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
(StatsCollector-2:null) Unable to obtain host 1 statistics.

2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
(StatsCollector-2:null) Received invalid host stats for host: 1

2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-1:null) Seq 1-201916422: Forwarding null to 233845174730255

2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-12:null) Seq 1-201916422: Routing from 233845174730253

2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-12:null) Seq 1-201916422: Link is closed

2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
Req: Resource [Host:1] is unreachable: Host 1: Link is

closed

2013-10-28 21:27:23,284 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
Routing to peer

2013-10-28 21:27:23,286 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
Cancel request received

2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.

2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
this is the current command

2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 1-201916422: Waiting some more time because
this is the current command

2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
(StatsCollector-1:null) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions

2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 1-201916422: Timed out on null

2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 1-201916422: Cancelling.

2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
(StatsCollector-1:null) Unable to send storage pool command to
Pool[200|LVM] via 1

com.cloud.exception.OperationTimedoutException: Commands 201916422 to Host
1 timed out after 3600

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
(StatsCollector-1:null) Unable to reach Pool[200|LVM]

com.cloud.exception.StorageUnavailableException: Resource [StoragePool:200]
is unreachable: Unable to send command to the pool

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-2:null) Seq 2-1168703496: Forwarding null to 233845174730255

2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
233845174730253

2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed

2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId 233845174730253:
Req: Resource [Host:2] is unreachable: Host 2: Link is closed

2013-10-28 21:27:23,302 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
Routing to peer

2013-10-28 21:27:23,303 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
Cancel request received

2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.

2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
this is the current command

2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 2-1168703496: Waiting some more time because
this is the current command

2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
(StatsCollector-2:null) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions

2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 2-1168703496: Timed out on null

2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 2-1168703496: Cancelling.

2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
(StatsCollector-2:null) Operation timed out: Commands 1168703496 to Host 2
timed out after 3600

2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
(StatsCollector-2:null) Unable to obtain host 2 statistics.

2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
(StatsCollector-2:null) Received invalid host stats for host: 2

2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-1:null) Seq 2-1168703497: Forwarding null to 233845174730255

2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-1:null) Seq 2-1168703497: Routing from 233845174730253

2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed

2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
Req: Resource [Host:2] is unreachable: Host 2: Link is closed

2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
Routing to peer

2013-10-28 21:27:23,310 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req: Cancel
request received

2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.

2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
this is the current command

2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703497: Waiting some more time because
this is the current command

2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
(StatsCollector-1:null) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions

2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703497: Timed out on null

2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703497: Cancelling.

2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
(StatsCollector-1:null) Unable to send storage pool command to
Pool[201|LVM] via 2

com.cloud.exception.OperationTimedoutException: Commands 1168703497 to Host
2 timed out after 3600

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
(StatsCollector-1:null) Unable to reach Pool[201|LVM]

com.cloud.exception.StorageUnavailableException: Resource [StoragePool:201]
is unreachable: Unable to send command to the pool

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-1:null) Seq 2-1168703498: Forwarding null to 233845174730255

2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-3:null) Seq 2-1168703498: Routing from 233845174730253

2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed

2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
Req: Resource [Host:2] is unreachable: Host 2: Link is closed

2013-10-28 21:27:23,330 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
Routing to peer

2013-10-28 21:27:23,331 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req: Cancel
request received

2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.

2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
this is the current command

2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703498: Waiting some more time because
this is the current command

2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
(StatsCollector-1:null) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions

2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703498: Timed out on null

2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
(StatsCollector-1:null) Seq 2-1168703498: Cancelling.

2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
(StatsCollector-1:null) Unable to send storage pool command to
Pool[202|NetworkFilesystem] via 2

com.cloud.exception.OperationTimedoutException: Commands 1168703498 to Host
2 timed out after 3600

        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)

        at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)

        at
com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)

        at
com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:679)

IP tables is disable on the XS hosts so the connection prob is not a
firewall issue.

If I do an xe se-list I see all 3 of the above SRs and the hosts have
mounted the NFS SR and can access it.




On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui <ca...@reategui.com>wrote:

> Using CS 4.1.1 with 2 hosts running XS 6.0.2
>
> Had to shut everything down and now I am having problems bringing things
> up.
>
> As suggested I used CS to stop all my instances as well as the system VMs
> and the SR. Then I shutdown the XS 6.02 servers after enabling maintenance
> mode from the CS console.
>
> After bringing things up, my XS servers had the infamous interface-rename
> issue which I resolved by editing the udev rules file manually.
>
> Now I have my XS servers up but for some reason my pool master got changed
> so I used xe pool-designate-new-master to switch it back.
>
> I did not notice that this designation change had been picked up by CS and
> when starting it up it keeps trying to connect to the wrong pool master.
>  Should I switch XS to match CS or what do I need to change in CS to tell
> it what the pool master is?
>
> I tried putting the server that CS thinks is the master in maintenance
> mode from CS but that just ends up in an apparent infinite cycle spitting
> out endless lines like these:
>
> 2013-10-28 20:39:02,059 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq 2-855048230:
> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>
> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,060 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
>
> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,062 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
>
> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,063 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq 2-855048230:
> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>
> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,064 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
>
> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,066 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
>
> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,067 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq 2-855048230:
> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>
> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> 2013-10-28 20:39:02,068 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flag
>
> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>
> After stopping and restarting the MS, the first error I see is:
>
> 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&sessi
>
> onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>
> 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> (catalina-exec-1:null) unknown exception writing api response
>
> java.lang.NullPointerException
>
>         at
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:280)
>
>         at
> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.java:143)
>
>         at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
>
>         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>
>         at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
>
>         at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>
>         at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>
>         at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>
>         at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>
>         at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>
>         at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:615)
>
>         at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>
>         at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>
>         at
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
>
>         at
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
>
>         at
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88&response=json&session
>
> key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>
> Then I see a few of these:
>
> 2013-10-28 20:42:01,464 WARN  [agent.manager.ClusteredAgentManagerImpl]
> (HA-Worker-4:work-10) Unable to connect to peer management server:
> 233845174730255, ip: 172.30.45.2 due to Connection refused
>
> java.net.ConnectException: Connection refused
>
>         at sun.nio.ch.Net.connect(Native Method)
>
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>
>         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>
>         at
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
>
>         at
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
>
>         at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
>
>         at
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
>
> 2013-10-28 20:42:01,468 WARN  [agent.manager.ClusteredAgentManagerImpl]
> (HA-Worker-2:work-11) Unable to connect to peer management server:
> 233845174730255, ip: 172.30.45.2 due to Connection refused
>
> java.net.ConnectException: Connection refused
>
>         at sun.nio.ch.Net.connect(Native Method)
>
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>
>         at java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>
>         at
> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:477)
>
>         at
> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:172)
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigator.java:53)
>
>         at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:434)
>
>         at
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:829)
>
>
> The next error is:
>
> 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> (AgentManager-Handler-6:null) Caught the following exception but pushing on
>
> java.lang.NullPointerException
>
>         at
> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes.java:231)
>
>         at
> com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java:150)
>
>         at
> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclusionStrategy.java:38)
>
>         at
> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(DisjunctionExclusionStrategy.java:38)
>
>         at
> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:58)
>
>         at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
>
>         at
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
>
>         at
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
>
>         at com.google.gson.Gson.toJsonTree(Gson.java:220)
>
>         at com.google.gson.Gson.toJsonTree(Gson.java:197)
>
>         at
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:56)
>
>         at
> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.java:37)
>
>         at
> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer(JsonSerializationVisitor.java:184)
>
>         at
> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:160)
>
>         at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
>
>         at
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:62)
>
>         at
> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:53)
>
>         at com.google.gson.Gson.toJsonTree(Gson.java:220)
>
>         at com.google.gson.Gson.toJson(Gson.java:260)
>
>         at com.cloud.agent.transport.Request.toBytes(Request.java:316)
>
>         at com.cloud.agent.transport.Request.getBytes(Request.java:332)
>
>         at
> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgentManagerImpl.java:435)
>
>         at
> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandler.doTask(ClusteredAgentManagerImpl.java:641)
>
>         at com.cloud.utils.nio.Task.run(Task.java:83)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> and then the next set of errors I see over and over are:
>
> 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> (StatsCollector-2:null) Unable to send storage pool command to
> Pool[200|LVM] via 1
>
> com.cloud.exception.OperationTimedoutException: Commands 1112277002 to
> Host 1 timed out after 3600
>
>         at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> (StatsCollector-2:null) Unable to reach Pool[200|LVM]
>
> com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:200] is unreachable: Unable to send command to the pool
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>
>         at
> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>
>         at
> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> I have tried to force reconnect to both hosts but that ends up maxing out
> a CPU core and filling up the log file with endless log lines.
>
> Any thoughts on how to recover my system?
>
>
>
>
>
>
>