You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by to...@163.com on 2015/07/23 04:13:05 UTC

Fwd: Re: XenServer is disconnected after CS hosts shutdown

copy mail thread to @dev for seeking more help.


-------- Forwarded Message --------
Subject: 	Re: XenServer is disconnected after CS hosts shutdown
Date: 	Wed, 22 Jul 2015 21:03:13 +0800
From: 	tony_caotong@163.com
Reply-To: 	users@cloudstack.apache.org
To: 	users@cloudstack.apache.org, opsrunbook@gmail.com



Hey!  help please...

some news.
I think the cause is that the ACS host can't communicate with XenServer
host.
ACS continues outputing logs like this

2015-07-22 20:42:13,555 DEBUG [c.c.a.m.ClusteredAgentAttache]
(AgentManager-Handler-7:null) Seq 5-8174877748607582212: Forwarding Seq
5-8174877748607582212:  { Cmd , MgmtId: 279278805451459, via: 5, Ver:
v1, Flags: 100111, [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}]
} to 280345368052992

I am not sure that if the ACS status is wrong or some services on
xenserver are not opend.

on xenserver , I found *xenheartbeat.sh is not running.*
*(/bin/bash /opt/cloud/bin/xenheartbeat.sh
00d8e0d0-8561-4b3d-9044-cbc496ff22cc 120 60)*

As some operations about xenserver was pendingļ¼Œ xenserver can not be
deleted from web UI.

I got a temporary solution

1. delete jobs from DB cloud.vm_work_job.
2. delete xenserver from DB cloud.host.
3. add xenserver host back from web UI.

then it works.

Does anyone have a idea for this?

Could anyone tell what things does ACS do on xenserver host when adding
a xenserver ?

Thanks,

-----------
Cao Tong

On 07/22/2015 04:26 PM, tony_caotong@163.com wrote:
>
> @prashant, following it the answer of you questions
>
> 1. Yes, primary storage is connected fine for my xenserver.
>
> 2. No, Xenserver's password is not changed.
>
> 3. yes, web UI is fine, and I can login.
>
> 4.  before reboot, I unmanaged and disabled resources,  and after
> reboot I have enabled all of them.
>
> 5.  hosts is states is UP.
>
> 6. No yum update in anywhere.
>
> 7.  system VMs status is fine, i think.
>
> -----------
> Cao Tong
>
> On 07/22/2015 04:13 PM, tony_caotong@163.com wrote:
>>
>> Hi,
>>
>> After reinstall, I got the problem again
>>
>> So, I will describe once again.
>>
>> WHAT my environment looks like:
>>
>> I have a ACS server host and a xenserver host, After both reboot, I
>> can not create a VM on xenserver through ACS.
>> A KVM and A NFS are running together in ACS manager host.
>>
>> the status of new VM is always 'staring' on the WEB, but I can create
>> new VM using xencenter.
>>
>> ------------- ERR LOGS ----------
>> 2015-07-22 15:56:56,357 DEBUG [c.c.s.StorageManagerImpl]
>> (StatsCollector-3:ctx-1aa2e8c9) Unable to send storage pool command
>> to Pool[4|NetworkFilesystem] via 4
>> com.cloud.exception.OperationTimedoutException: Commands
>> 2829104990918803478 to Host 4 timed out after 3600
>>
>> 2015-07-22 15:56:56,358 INFO  [c.c.s.StatsCollector]
>> (StatsCollector-3:ctx-1aa2e8c9) Unable to reach
>> Pool[4|NetworkFilesystem]
>> com.cloud.exception.StorageUnavailableException: Resource
>> [StoragePool:4] is unreachable: Unable to send command to the pool
>>
>>
>> ------------- and there are lots of DEBUG infos  ------- repeat again
>> and again -----------
>>
>> 2015-07-22 15:36:12,887 DEBUG [c.c.a.m.ClusteredAgentAttache]
>> (AgentManager-Handler-14:null) Seq 4-8064821032713715922: Forwarding
>> Seq 4-8064821032713715922:  { Cmd , MgmtId: 227448510156211, via: 4,
>> Ver: v1, Flags: 100111,
>> [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}] } to
>> 116784073679673
>> 2015-07-22 15:36:12,889 DEBUG [c.c.a.m.ClusteredAgentAttache]
>> (AgentManager-Handler-10:null) Seq 4-8064821032713715883: Forwarding
>> Seq 4-8064821032713715883:  { Cmd , MgmtId: 227448510156211, via: 4,
>> Ver: v1, Flags: 100111,
>> [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/5/af949612-838f-3a6d-931b-312e612db740.vhd","origUrl":"http://download.cloud.com/templates/builtin/centos56-x86_64.vhd.bz2","uuid":"80b60e46-3017-11e5-8736-00259091a13a","id":5,"format":"VHD","accountId":1,"checksum":"905cec879afd9c9d22ecc8036131a180","hvm":false,"displayText":"CentOS
>> 5.6(64-bit) no GUI
>> (XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://10.0.0.100/storage/secondary","_role":"Image"}},"name":"centos56-x86_64-xen","hypervisorType":"XenServer"}},"destTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"origUrl":"http://download.cloud.com/templates/builtin/centos56-x86_64.vhd.bz2","uuid":"80b60e46-3017-11e5-8736-00259091a13a","id":5,"format":"VHD","accountId":1,"checksum":"905cec879afd9c9d22ecc8036131a180","hvm":false,"displayText":"CentOS
>> 5.6(64-bit) no GUI
>> (XenServer)","imageDataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"2df26406-31bf-3a95-8a61-f5008defd9a0","id":4,"poolType":"NetworkFilesystem","host":"10.0.0.100","path":"/storage/xen/primary","port":2049,"url":"NetworkFilesystem://10.0.0.100/storage/xen/primary/?ROLE=Primary&STOREUUID=2df26406-31bf-3a95-8a61-f5008defd9a0"}},"name":"centos56-x86_64-xen","hypervisorType":"XenServer"}},"executeInSequence":true,"options":{},"wait":10800}}]
>> } to 116784073679673
>>
>>
>> -----------------------------------------
>>
>> Anyone have Any ideas?  thanks.
>>
>> -----------
>> Cao Tong
>>
>> On 07/21/2015 06:14 PM, tony_caotong@163.com wrote:
>>>
>>> Thanks all,
>>>
>>> I have already reinstall my hosts for preparing a new clear
>>> environment to restart my research.
>>>
>>> -----------
>>> Cao Tong
>>>
>>> On 07/20/2015 09:24 PM, Prashant s wrote:
>>>> some questions :
>>>>
>>>> can you please tell ...
>>>>
>>>> 1. is your NFS storage or your primary Storage Repository in connected
>>>> mode with no red cross mark on them in xencenter.
>>>> 2. did you change any passwords on the xenservers ?
>>>> 3. is the cloudstack web ui up , can you login to the cloudstack
>>>> web page.
>>>> 4. *are the zone , pod, or clusters in unmanaged or disabled state ? *
>>>> *5. is all the hosts in connected state  ? *
>>>> *6. did you run  yum update on host reboot on the cs manager vm ? *
>>>> *7. system vms are stateless you can kill them and cs will recreate
>>>> a new
>>>> one .. so dont worry :-) *
>>>>
>>>>
>>>> *thanks *
>>>> *prashant *
>>>>
>>>>
>>>>
>>>> On Mon, Jul 20, 2015 at 3:47 AM, <to...@163.com> wrote:
>>>>
>>>>> Hi, I restartd All hosts (one mgr and xenserver) again.
>>>>>
>>>>>
>>>>> Following is the error log.
>>>>>
>>>>>
>>>>> 2015-07-20 15:33:49,688 INFO [c.c.u.e.CSExceptionErrorCode]
>>>>> (StatsCollector-3:ctx-692a5392) Could not find exception:
>>>>> com.cloud.exception.OperationTimedoutException in error code list for
>>>>> exceptions
>>>>> 2015-07-20 15:33:49,688 WARN  [c.c.a.m.AgentAttache]
>>>>> (StatsCollector-3:ctx-692a5392) Seq 1-3176445112179752972: Timed
>>>>> out on null
>>>>> 2015-07-20 15:33:49,689 DEBUG [c.c.a.m.AgentAttache]
>>>>> (StatsCollector-3:ctx-692a5392) Seq 1-3176445112179752972:
>>>>> Cancelling.
>>>>> 2015-07-20 15:33:49,689 DEBUG [c.c.s.StorageManagerImpl]
>>>>> (StatsCollector-3:ctx-692a5392) Unable to send storage pool
>>>>> command to
>>>>> Pool[1|NetworkFilesystem] via 1
>>>>> com.cloud.exception.OperationTimedoutException: Commands
>>>>> 3176445112179752972 to Host 1 timed out after 3600
>>>>>          at
>>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:436)
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:433)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:362)
>>>>>
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:1000)
>>>>>
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:392)
>>>>>
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:406)
>>>>>
>>>>>          at
>>>>> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:642)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>>>          at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>> 2015-07-20 15:33:49,689 INFO  [c.c.s.StatsCollector]
>>>>> (StatsCollector-3:ctx-692a5392) Unable to reach
>>>>> Pool[1|NetworkFilesystem]
>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>> [StoragePool:1]
>>>>> is unreachable: Unable to send command to the pool
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:1010)
>>>>>
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:392)
>>>>>
>>>>>          at
>>>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:406)
>>>>>
>>>>>          at
>>>>> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:642)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>>>>>
>>>>>          at
>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>>>          at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>
>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> -----------
>>>>> Cao Tong
>>>>>
>>>>>
>>>>> On 07/20/2015 02:52 PM, tony_caotong@163.com wrote:
>>>>>
>>>>>> No, no one's IP was changed.
>>>>>>
>>>>>> 1. In xenserver I can not login systemvms using the internal IP like
>>>>>> '169.254.1.112',  There shoud be a bridge network for this
>>>>>> right?  it is
>>>>>> gone.
>>>>>>
>>>>>> 2. I try to delete xenserver host from CS on web, it also failed
>>>>>> with
>>>>>> lots of logs like following, then memory is full and mangement
>>>>>> down...
>>>>>>
>>>>>> 2015-07-20 14:47:30,580 DEBUG [c.c.a.m.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-15:null) Seq 1-7282039122481381399:
>>>>>> Forwarding Seq
>>>>>> 1-7282039122481381399:  { Cmd , MgmtId: 104062526015411, via: 1,
>>>>>> Ver: v1,
>>>>>> Flags: 100111,
>>>>>> [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}] } to
>>>>>> 192405008094602
>>>>>> 2015-07-20 14:47:30,582 DEBUG [c.c.a.m.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-5:null) Seq 1-7282039122481381399:
>>>>>> Forwarding Seq
>>>>>> 1-7282039122481381399:  { Cmd , MgmtId: 104062526015411, via: 1,
>>>>>> Ver: v1,
>>>>>> Flags: 100111,
>>>>>> [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}] } to
>>>>>> 192405008094602
>>>>>> 2015-07-20 14:47:30,583 DEBUG [c.c.a.m.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-1:null) Seq 1-7282039122481381399:
>>>>>> Forwarding Seq
>>>>>> 1-7282039122481381399:  { Cmd , MgmtId: 104062526015411, via: 1,
>>>>>> Ver: v1,
>>>>>> Flags: 100111,
>>>>>> [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}] } to
>>>>>> 192405008094602
>>>>>> 2015-07-20 14:47:30,584 DEBUG [c.c.a.m.ClusteredAgentAttache]
>>>>>> (AgentManager-Handler-14:null) Seq 1-7282039122481381399:
>>>>>> Forwarding Seq
>>>>>> 1-7282039122481381399:  { Cmd , MgmtId: 104062526015411, via: 1,
>>>>>> Ver: v1,
>>>>>> Flags: 100111,
>>>>>> [{"com.cloud.agent.api.MaintainCommand":{"wait":0}}] } to
>>>>>> 192405008094602
>>>>>>
>>>>>>
>>>>>> I guess that,  is there some service or daemons working for CS is
>>>>>> not up
>>>>>> on Xenserver ?
>>>>>>
>>>>>>
>>>>>> -----------
>>>>>> Cao Tong
>>>>>> On 07/20/2015 02:35 PM, Rajani Karuturi wrote:
>>>>>>
>>>>>>> Did the management server ip change?
>>>>>>> management server ip in the configuration table is used my
>>>>>>> systemvms.
>>>>>>> select * from configuration where name like 'host';
>>>>>>>
>>>>>>> If it changed, correct the value in db and restart systemvms.
>>>>>>>
>>>>>>>
>>>>>>> ~Rajani
>>>>>>>
>>>>>>> On Mon, Jul 20, 2015 at 11:56 AM,<to...@163.com>  wrote:
>>>>>>>
>>>>>>>   Hello,
>>>>>>>> I shutdown my cs-manager and xenserver last weekend, And now
>>>>>>>> the ssvm
>>>>>>>> and cpvm is disconnect, thost two was runing on xenserver. so What
>>>>>>>> should i do right now ?
>>>>>>>> Please anybody help me and thanks.
>>>>>>>>
>>>>>>>> In xenserver  I found that the three system VMs are not running.
>>>>>>>> my xenserver seems can not reconnect to CS-manager. and it
>>>>>>>> seams not
>>>>>>>> under control of CS.
>>>>>>>>
>>>>>>>>
>>>>>>>> What is the right steps of shutdown all CS group machines and
>>>>>>>> resume
>>>>>>>> them?
>>>>>>>> How can i let my xenserver reconnected ?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------
>>>>>>>> Cao Tong
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>>
>>
>>
>
>