You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Caleb Call <cc...@overstock.com> on 2013/02/11 23:17:50 UTC

Xenserver Host unable to reconnect

We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:

2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
com.cloud.api.ServerApiException
        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed


I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.

The only thing I can see any indication it's even trying is this line:

2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14

10.5.1.14 is the host that should be reconnecting but is not.

Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?

Thanks


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

So, we're not sure what we did, but something happened and it allowed us to re-add it.  We tried adding another temporary host to this zone, but that failed, then we were able to re-add this host and it worked.

I also found something, not sure if this change was intentional or not, but it looks like since upgrading to 4.0, the CPU Mhz is respected more.  Our CPUs are 2.2Ghz CPUs, there's 32 of them, so it reports as 70400 Mhz as available.  Our Computer Offering was set to allocate 2700 Mhz per core, and 2 cores.  It seems that since each individual core is only 2200, it's less than the 2700 per core it was looking for, instead of just using it as a resource pool.  It would fail and say no suitable hosts as the CPUMHz was false.  Mind you these same computer offerings worked fine in 3.0.2.  I created new offerings with everything the same, but only 2000 Mhz and now our VMs are starting up no problem.



On Feb 11, 2013, at 11:53 PM, Nitin Mehta <Ni...@citrix.com> wrote:

> Would it not interfere when you are adding it back ? Do you not have to
> set the removed column in the host table ?
> 
> On 12/02/13 12:00 PM, "Devdeep Singh" <de...@citrix.com> wrote:
> 
>> Hi Caleb,
>> 
>> Do you have any instances running on the host? If not, then removing it
>> shouldn't cause any problems. Can you also check if xapi is running on
>> the host.
>> 
>> Regards,
>> Devdeep
>> 
>>> -----Original Message-----
>>> From: Caleb Call [mailto:calebcall@me.com]
>>> Sent: Tuesday, February 12, 2013 5:37 AM
>>> To: cloudstack-users@incubator.apache.org
>>> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
>>> dev@incubator.apache.org
>>> Subject: Re: Xenserver Host unable to reconnect
>>> 
>>> Tried this but still nothing.  As I mentioned before, the
>>> mgmt_server_id is
>>> already NULL.
>>> 
>>> Here's a question for the group, if we pull this single host out of
>>> it's cluster,
>>> will it do anything to the VMs that area already in this zone?  We have
>>> had
>>> similar problems in our other zones that have multiple hosts, but we've
>>> been
>>> able to pull those hosts out, rebuild them and re-add them and then
>>> everything is happy.
>>> 
>>> Can we do the same for a single node cluster?
>>> 
>>> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
>>> 
>>>> Try this,
>>>> 
>>>> - stop management server
>>>> - null out mgmt_server_id for hosts in host table
>>>> - start management server
>>>> 
>>>> 
>>>> Anthony
>>>> 
>>>>> -----Original Message-----
>>>>> From: Caleb Call [mailto:ccall@overstock.com]
>>>>> Sent: Monday, February 11, 2013 2:48 PM
>>>>> To: <ae...@gmail.com>
>>>>> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>>>>> users@incubator.apache.org
>>>>> Subject: Re: Xenserver Host unable to reconnect
>>>>> 
>>>>> Yes, I canceled maintenance mode via Cloudstack.
>>>>> 
>>>>> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>>>>> <ae...@gmail.com>>
>>>>> wrote:
>>>>> 
>>>>> is the host still in maintenance mode, have you cancelled maintenance
>>>>> via cloudstack?
>>>>> 
>>>>> 
>>>>> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>>>>> <ca...@me.com>> wrote:
>>>>> No luck, the mgmt_server_id is already null.
>>>>> 
>>>>> 
>>>>> On Feb 11, 2013, at 3:27 PM, Caleb Call
>>>>> <cc...@overstock.com>> wrote:
>>>>> 
>>>>>> Yes to both of those, I should have mentioned I have tried to make
>>>>> sure connectivity is still good.  I'll try nulling out the
>>>>> mgmt_server_id in the host table and see if that works.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>>>>> 
>>> <ae...@gmail.com><mailto:aemneina@gm
>>> ail.
>>>>> co
>>>>> m<ma...@gmail.com>>>
>>>>>> wrote:
>>>>>> 
>>>>>> from the management server, can you ssh to that host? can you
>>>>>> execute
>>>>> xe commands on that host? if yes to both those, null out the
>>>>> mgmt_server_id from your host in the host table... then issue the
>>>>> force reconnect. see if that helps.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>>>>> 
>>> <cc...@overstock.com><mailto:ccall@oversto
>>>>> ck .com<ma...@overstock.com>>> wrote:
>>>>>> We have a zone that has a single host in it.  We also recently
>>>>> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>>>>> mention it anyways).  We put our host in maintenance mode (all VMs
>>>>> were shutdown, etc) and applied some patches that were waiting to be
>>> applied.
>>>>> After coming back up, it now is unable to reconnect, when I try to
>>>>> force reconnect, I get the following in the management log:
>>>>>> 
>>>>>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
>>> (catalina-
>>>>> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>>>>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>>>>> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>>>>> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>>>>> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>>>>> 
>>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
>>> G
>>>>> dB
>>>>> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
>>>>> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
>>>>> callbackAddress:
>>>>> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>>>>> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>>>>> lastPolled: null, created: null}
>>>>>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>>> (Job-
>>>>> Executor-3:job-4806) Executing
>>>>> com.cloud.api.commands.ReconnectHostCmd
>>>>> for job-4806
>>>>>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
>>> (Job-
>>>>> Executor-3:job-4806) Unable to disconnect host because it is not
>>>>> connected to this server: 25
>>>>>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
>>> (Job-
>>>>> Executor-3:job-4806) Exception:
>>>>>> com.cloud.api.ServerApiException
>>>>>>      at
>>>>> 
>>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
>>> a
>>>>> :1
>>>>> 08)
>>>>>>      at 
>>> com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>>>>>>      at
>>>>> 
>>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
>>>>> 2)
>>>>>>      at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
>>>>> 1)
>>>>>>      at
>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>      at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>>>>> ja
>>>>> va:1110)
>>>>>>      at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>>>>> .j
>>>>> ava:603)
>>>>>>      at java.lang.Thread.run(Thread.java:679)
>>>>>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>>>>> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>>>>>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>>> (Job-
>>>>> Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
>>> resultCode:
>>>>> 530, result: Error Code: 534 Error text: null
>>>>>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
>>> (catalina-
>>>>> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>>>>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>> (catalina-exec-17:null) Async job-4806 completed
>>>>>> 
>>>>>> 
>>>>>> I can't find in the logs where it's trying (besides the force
>>>>> reconnect) to reconnect on it's own.  I do see where it acknowledges
>>>>> the state of Alert for the host, but doesn't give any reasoning as to
>>>>> why.
>>>>>> 
>>>>>> The only thing I can see any indication it's even trying is this
>>> line:
>>>>>> 
>>>>>> 2013-02-11 11:47:05,670 DEBUG
>>> [xen.resource.XenServerConnectionPool]
>>>>> (ClusteredAgentManager Timer:null) Failed to slave local login to
>>>>> 10.5.1.14
>>>>>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>>>>> (ClusteredAgentManager Timer:null) Unable to configure resource due
>>>>> to Can not create slave connection to 10.5.1.14
>>>>>> 
>>>>>> 10.5.1.14 is the host that should be reconnecting but is not.
>>>>>> 
>>>>>> Anything else I can look at as to why it's not connecting?  Any
>>>>> suggestions on why my host won't reconnect?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>>>>>> and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>>>>>> and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>> 
> 
>

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

So, we're not sure what we did, but something happened and it allowed us to re-add it.  We tried adding another temporary host to this zone, but that failed, then we were able to re-add this host and it worked.

I also found something, not sure if this change was intentional or not, but it looks like since upgrading to 4.0, the CPU Mhz is respected more.  Our CPUs are 2.2Ghz CPUs, there's 32 of them, so it reports as 70400 Mhz as available.  Our Computer Offering was set to allocate 2700 Mhz per core, and 2 cores.  It seems that since each individual core is only 2200, it's less than the 2700 per core it was looking for, instead of just using it as a resource pool.  It would fail and say no suitable hosts as the CPUMHz was false.  Mind you these same computer offerings worked fine in 3.0.2.  I created new offerings with everything the same, but only 2000 Mhz and now our VMs are starting up no problem.



On Feb 11, 2013, at 11:53 PM, Nitin Mehta <Ni...@citrix.com> wrote:

> Would it not interfere when you are adding it back ? Do you not have to
> set the removed column in the host table ?
> 
> On 12/02/13 12:00 PM, "Devdeep Singh" <de...@citrix.com> wrote:
> 
>> Hi Caleb,
>> 
>> Do you have any instances running on the host? If not, then removing it
>> shouldn't cause any problems. Can you also check if xapi is running on
>> the host.
>> 
>> Regards,
>> Devdeep
>> 
>>> -----Original Message-----
>>> From: Caleb Call [mailto:calebcall@me.com]
>>> Sent: Tuesday, February 12, 2013 5:37 AM
>>> To: cloudstack-users@incubator.apache.org
>>> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
>>> dev@incubator.apache.org
>>> Subject: Re: Xenserver Host unable to reconnect
>>> 
>>> Tried this but still nothing.  As I mentioned before, the
>>> mgmt_server_id is
>>> already NULL.
>>> 
>>> Here's a question for the group, if we pull this single host out of
>>> it's cluster,
>>> will it do anything to the VMs that area already in this zone?  We have
>>> had
>>> similar problems in our other zones that have multiple hosts, but we've
>>> been
>>> able to pull those hosts out, rebuild them and re-add them and then
>>> everything is happy.
>>> 
>>> Can we do the same for a single node cluster?
>>> 
>>> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
>>> 
>>>> Try this,
>>>> 
>>>> - stop management server
>>>> - null out mgmt_server_id for hosts in host table
>>>> - start management server
>>>> 
>>>> 
>>>> Anthony
>>>> 
>>>>> -----Original Message-----
>>>>> From: Caleb Call [mailto:ccall@overstock.com]
>>>>> Sent: Monday, February 11, 2013 2:48 PM
>>>>> To: <ae...@gmail.com>
>>>>> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>>>>> users@incubator.apache.org
>>>>> Subject: Re: Xenserver Host unable to reconnect
>>>>> 
>>>>> Yes, I canceled maintenance mode via Cloudstack.
>>>>> 
>>>>> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>>>>> <ae...@gmail.com>>
>>>>> wrote:
>>>>> 
>>>>> is the host still in maintenance mode, have you cancelled maintenance
>>>>> via cloudstack?
>>>>> 
>>>>> 
>>>>> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>>>>> <ca...@me.com>> wrote:
>>>>> No luck, the mgmt_server_id is already null.
>>>>> 
>>>>> 
>>>>> On Feb 11, 2013, at 3:27 PM, Caleb Call
>>>>> <cc...@overstock.com>> wrote:
>>>>> 
>>>>>> Yes to both of those, I should have mentioned I have tried to make
>>>>> sure connectivity is still good.  I'll try nulling out the
>>>>> mgmt_server_id in the host table and see if that works.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>>>>> 
>>> <ae...@gmail.com><mailto:aemneina@gm
>>> ail.
>>>>> co
>>>>> m<ma...@gmail.com>>>
>>>>>> wrote:
>>>>>> 
>>>>>> from the management server, can you ssh to that host? can you
>>>>>> execute
>>>>> xe commands on that host? if yes to both those, null out the
>>>>> mgmt_server_id from your host in the host table... then issue the
>>>>> force reconnect. see if that helps.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>>>>> 
>>> <cc...@overstock.com><mailto:ccall@oversto
>>>>> ck .com<ma...@overstock.com>>> wrote:
>>>>>> We have a zone that has a single host in it.  We also recently
>>>>> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>>>>> mention it anyways).  We put our host in maintenance mode (all VMs
>>>>> were shutdown, etc) and applied some patches that were waiting to be
>>> applied.
>>>>> After coming back up, it now is unable to reconnect, when I try to
>>>>> force reconnect, I get the following in the management log:
>>>>>> 
>>>>>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
>>> (catalina-
>>>>> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>>>>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>>>>> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>>>>> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>>>>> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>>>>> 
>>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
>>> G
>>>>> dB
>>>>> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
>>>>> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
>>>>> callbackAddress:
>>>>> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>>>>> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>>>>> lastPolled: null, created: null}
>>>>>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>>> (Job-
>>>>> Executor-3:job-4806) Executing
>>>>> com.cloud.api.commands.ReconnectHostCmd
>>>>> for job-4806
>>>>>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
>>> (Job-
>>>>> Executor-3:job-4806) Unable to disconnect host because it is not
>>>>> connected to this server: 25
>>>>>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
>>> (Job-
>>>>> Executor-3:job-4806) Exception:
>>>>>> com.cloud.api.ServerApiException
>>>>>>      at
>>>>> 
>>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
>>> a
>>>>> :1
>>>>> 08)
>>>>>>      at 
>>> com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>>>>>>      at
>>>>> 
>>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
>>>>> 2)
>>>>>>      at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
>>>>> 1)
>>>>>>      at
>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>      at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>>>>> ja
>>>>> va:1110)
>>>>>>      at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>>>>> .j
>>>>> ava:603)
>>>>>>      at java.lang.Thread.run(Thread.java:679)
>>>>>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>>>>> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>>>>>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>>> (Job-
>>>>> Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
>>> resultCode:
>>>>> 530, result: Error Code: 534 Error text: null
>>>>>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
>>> (catalina-
>>>>> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>>>>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>>>>> (catalina-exec-17:null) Async job-4806 completed
>>>>>> 
>>>>>> 
>>>>>> I can't find in the logs where it's trying (besides the force
>>>>> reconnect) to reconnect on it's own.  I do see where it acknowledges
>>>>> the state of Alert for the host, but doesn't give any reasoning as to
>>>>> why.
>>>>>> 
>>>>>> The only thing I can see any indication it's even trying is this
>>> line:
>>>>>> 
>>>>>> 2013-02-11 11:47:05,670 DEBUG
>>> [xen.resource.XenServerConnectionPool]
>>>>> (ClusteredAgentManager Timer:null) Failed to slave local login to
>>>>> 10.5.1.14
>>>>>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>>>>> (ClusteredAgentManager Timer:null) Unable to configure resource due
>>>>> to Can not create slave connection to 10.5.1.14
>>>>>> 
>>>>>> 10.5.1.14 is the host that should be reconnecting but is not.
>>>>>> 
>>>>>> Anything else I can look at as to why it's not connecting?  Any
>>>>> suggestions on why my host won't reconnect?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>>>>>> and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>>>>>> and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>>>>> review of the individual or entity to which it is addressed and may
>>>>> contain information that is privileged and confidential. If the
>>>>> reader of this message is not the intended recipient, or the employee
>>>>> or agent responsible for delivering the message solely to the
>>>>> intended recipient, you are hereby notified that any dissemination,
>>>>> distribution or copying of this communication is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> sender immediately by telephone or return email. Thank you.
>> 
> 
>

Re: Xenserver Host unable to reconnect

Posted by Nitin Mehta <Ni...@citrix.com>.

Would it not interfere when you are adding it back ? Do you not have to
set the removed column in the host table ?

On 12/02/13 12:00 PM, "Devdeep Singh" <de...@citrix.com> wrote:

>Hi Caleb,
>
>Do you have any instances running on the host? If not, then removing it
>shouldn't cause any problems. Can you also check if xapi is running on
>the host.
>
>Regards,
>Devdeep
>
>> -----Original Message-----
>> From: Caleb Call [mailto:calebcall@me.com]
>> Sent: Tuesday, February 12, 2013 5:37 AM
>> To: cloudstack-users@incubator.apache.org
>> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
>> dev@incubator.apache.org
>> Subject: Re: Xenserver Host unable to reconnect
>> 
>> Tried this but still nothing.  As I mentioned before, the
>>mgmt_server_id is
>> already NULL.
>> 
>> Here's a question for the group, if we pull this single host out of
>>it's cluster,
>> will it do anything to the VMs that area already in this zone?  We have
>>had
>> similar problems in our other zones that have multiple hosts, but we've
>>been
>> able to pull those hosts out, rebuild them and re-add them and then
>> everything is happy.
>> 
>> Can we do the same for a single node cluster?
>> 
>> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
>> 
>> > Try this,
>> >
>> > - stop management server
>> > - null out mgmt_server_id for hosts in host table
>> > - start management server
>> >
>> >
>> > Anthony
>> >
>> >> -----Original Message-----
>> >> From: Caleb Call [mailto:ccall@overstock.com]
>> >> Sent: Monday, February 11, 2013 2:48 PM
>> >> To: <ae...@gmail.com>
>> >> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>> >> users@incubator.apache.org
>> >> Subject: Re: Xenserver Host unable to reconnect
>> >>
>> >> Yes, I canceled maintenance mode via Cloudstack.
>> >>
>> >> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>> >> <ae...@gmail.com>>
>> >> wrote:
>> >>
>> >> is the host still in maintenance mode, have you cancelled maintenance
>> >> via cloudstack?
>> >>
>> >>
>> >> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>> >> <ca...@me.com>> wrote:
>> >> No luck, the mgmt_server_id is already null.
>> >>
>> >>
>> >> On Feb 11, 2013, at 3:27 PM, Caleb Call
>> >> <cc...@overstock.com>> wrote:
>> >>
>> >>> Yes to both of those, I should have mentioned I have tried to make
>> >> sure connectivity is still good.  I'll try nulling out the
>> >> mgmt_server_id in the host table and see if that works.
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>> >>
>> <ae...@gmail.com><mailto:aemneina@gm
>> ail.
>> >> co
>> >> m<ma...@gmail.com>>>
>> >>> wrote:
>> >>>
>> >>> from the management server, can you ssh to that host? can you
>> >>> execute
>> >> xe commands on that host? if yes to both those, null out the
>> >> mgmt_server_id from your host in the host table... then issue the
>> >> force reconnect. see if that helps.
>> >>>
>> >>>
>> >>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>> >>
>> <cc...@overstock.com><mailto:ccall@oversto
>> >> ck .com<ma...@overstock.com>>> wrote:
>> >>> We have a zone that has a single host in it.  We also recently
>> >> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>> >> mention it anyways).  We put our host in maintenance mode (all VMs
>> >> were shutdown, etc) and applied some patches that were waiting to be
>> applied.
>> >> After coming back up, it now is unable to reconnect, when I try to
>> >> force reconnect, I get the following in the management log:
>> >>>
>> >>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
>> (catalina-
>> >> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>> >>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>> >> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>> >> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>> >> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>> >>
>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
>> G
>> >> dB
>> >> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
>> >> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
>> >> callbackAddress:
>> >> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>> >> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>> >> lastPolled: null, created: null}
>> >>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >>> (Job-
>> >> Executor-3:job-4806) Executing
>> >> com.cloud.api.commands.ReconnectHostCmd
>> >> for job-4806
>> >>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
>> (Job-
>> >> Executor-3:job-4806) Unable to disconnect host because it is not
>> >> connected to this server: 25
>> >>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
>> (Job-
>> >> Executor-3:job-4806) Exception:
>> >>> com.cloud.api.ServerApiException
>> >>>       at
>> >>
>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
>> a
>> >> :1
>> >> 08)
>> >>>       at 
>>com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>> >>>       at
>> >>
>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
>> >> 2)
>> >>>       at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
>> >> 1)
>> >>>       at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>       at
>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>> >> ja
>> >> va:1110)
>> >>>       at
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> >> .j
>> >> ava:603)
>> >>>       at java.lang.Thread.run(Thread.java:679)
>> >>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>> >> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>> >>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >>> (Job-
>> >> Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
>>resultCode:
>> >> 530, result: Error Code: 534 Error text: null
>> >>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
>> (catalina-
>> >> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>> >>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >> (catalina-exec-17:null) Async job-4806 completed
>> >>>
>> >>>
>> >>> I can't find in the logs where it's trying (besides the force
>> >> reconnect) to reconnect on it's own.  I do see where it acknowledges
>> >> the state of Alert for the host, but doesn't give any reasoning as to
>> >> why.
>> >>>
>> >>> The only thing I can see any indication it's even trying is this
>>line:
>> >>>
>> >>> 2013-02-11 11:47:05,670 DEBUG
>> [xen.resource.XenServerConnectionPool]
>> >> (ClusteredAgentManager Timer:null) Failed to slave local login to
>> >> 10.5.1.14
>> >>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>> >> (ClusteredAgentManager Timer:null) Unable to configure resource due
>> >> to Can not create slave connection to 10.5.1.14
>> >>>
>> >>> 10.5.1.14 is the host that should be reconnecting but is not.
>> >>>
>> >>> Anything else I can look at as to why it's not connecting?  Any
>> >> suggestions on why my host won't reconnect?
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>> ________________________________
>> >>>
>> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>> >>> and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>> >>>
>> >>>
>> >>>
>> >>> ________________________________
>> >>>
>> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>> >>> and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >>
>> >> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>

Re: Xenserver Host unable to reconnect

Posted by Nitin Mehta <Ni...@citrix.com>.

Would it not interfere when you are adding it back ? Do you not have to
set the removed column in the host table ?

On 12/02/13 12:00 PM, "Devdeep Singh" <de...@citrix.com> wrote:

>Hi Caleb,
>
>Do you have any instances running on the host? If not, then removing it
>shouldn't cause any problems. Can you also check if xapi is running on
>the host.
>
>Regards,
>Devdeep
>
>> -----Original Message-----
>> From: Caleb Call [mailto:calebcall@me.com]
>> Sent: Tuesday, February 12, 2013 5:37 AM
>> To: cloudstack-users@incubator.apache.org
>> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
>> dev@incubator.apache.org
>> Subject: Re: Xenserver Host unable to reconnect
>> 
>> Tried this but still nothing.  As I mentioned before, the
>>mgmt_server_id is
>> already NULL.
>> 
>> Here's a question for the group, if we pull this single host out of
>>it's cluster,
>> will it do anything to the VMs that area already in this zone?  We have
>>had
>> similar problems in our other zones that have multiple hosts, but we've
>>been
>> able to pull those hosts out, rebuild them and re-add them and then
>> everything is happy.
>> 
>> Can we do the same for a single node cluster?
>> 
>> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
>> 
>> > Try this,
>> >
>> > - stop management server
>> > - null out mgmt_server_id for hosts in host table
>> > - start management server
>> >
>> >
>> > Anthony
>> >
>> >> -----Original Message-----
>> >> From: Caleb Call [mailto:ccall@overstock.com]
>> >> Sent: Monday, February 11, 2013 2:48 PM
>> >> To: <ae...@gmail.com>
>> >> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>> >> users@incubator.apache.org
>> >> Subject: Re: Xenserver Host unable to reconnect
>> >>
>> >> Yes, I canceled maintenance mode via Cloudstack.
>> >>
>> >> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>> >> <ae...@gmail.com>>
>> >> wrote:
>> >>
>> >> is the host still in maintenance mode, have you cancelled maintenance
>> >> via cloudstack?
>> >>
>> >>
>> >> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>> >> <ca...@me.com>> wrote:
>> >> No luck, the mgmt_server_id is already null.
>> >>
>> >>
>> >> On Feb 11, 2013, at 3:27 PM, Caleb Call
>> >> <cc...@overstock.com>> wrote:
>> >>
>> >>> Yes to both of those, I should have mentioned I have tried to make
>> >> sure connectivity is still good.  I'll try nulling out the
>> >> mgmt_server_id in the host table and see if that works.
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>> >>
>> <ae...@gmail.com><mailto:aemneina@gm
>> ail.
>> >> co
>> >> m<ma...@gmail.com>>>
>> >>> wrote:
>> >>>
>> >>> from the management server, can you ssh to that host? can you
>> >>> execute
>> >> xe commands on that host? if yes to both those, null out the
>> >> mgmt_server_id from your host in the host table... then issue the
>> >> force reconnect. see if that helps.
>> >>>
>> >>>
>> >>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>> >>
>> <cc...@overstock.com><mailto:ccall@oversto
>> >> ck .com<ma...@overstock.com>>> wrote:
>> >>> We have a zone that has a single host in it.  We also recently
>> >> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>> >> mention it anyways).  We put our host in maintenance mode (all VMs
>> >> were shutdown, etc) and applied some patches that were waiting to be
>> applied.
>> >> After coming back up, it now is unable to reconnect, when I try to
>> >> force reconnect, I get the following in the management log:
>> >>>
>> >>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
>> (catalina-
>> >> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>> >>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>> >> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>> >> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>> >> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>> >>
>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
>> G
>> >> dB
>> >> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
>> >> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
>> >> callbackAddress:
>> >> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>> >> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>> >> lastPolled: null, created: null}
>> >>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >>> (Job-
>> >> Executor-3:job-4806) Executing
>> >> com.cloud.api.commands.ReconnectHostCmd
>> >> for job-4806
>> >>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
>> (Job-
>> >> Executor-3:job-4806) Unable to disconnect host because it is not
>> >> connected to this server: 25
>> >>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
>> (Job-
>> >> Executor-3:job-4806) Exception:
>> >>> com.cloud.api.ServerApiException
>> >>>       at
>> >>
>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
>> a
>> >> :1
>> >> 08)
>> >>>       at 
>>com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>> >>>       at
>> >>
>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
>> >> 2)
>> >>>       at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
>> >> 1)
>> >>>       at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>       at
>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>> >> ja
>> >> va:1110)
>> >>>       at
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> >> .j
>> >> ava:603)
>> >>>       at java.lang.Thread.run(Thread.java:679)
>> >>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>> >> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>> >>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >>> (Job-
>> >> Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
>>resultCode:
>> >> 530, result: Error Code: 534 Error text: null
>> >>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
>> (catalina-
>> >> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>> >>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>> >> (catalina-exec-17:null) Async job-4806 completed
>> >>>
>> >>>
>> >>> I can't find in the logs where it's trying (besides the force
>> >> reconnect) to reconnect on it's own.  I do see where it acknowledges
>> >> the state of Alert for the host, but doesn't give any reasoning as to
>> >> why.
>> >>>
>> >>> The only thing I can see any indication it's even trying is this
>>line:
>> >>>
>> >>> 2013-02-11 11:47:05,670 DEBUG
>> [xen.resource.XenServerConnectionPool]
>> >> (ClusteredAgentManager Timer:null) Failed to slave local login to
>> >> 10.5.1.14
>> >>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>> >> (ClusteredAgentManager Timer:null) Unable to configure resource due
>> >> to Can not create slave connection to 10.5.1.14
>> >>>
>> >>> 10.5.1.14 is the host that should be reconnecting but is not.
>> >>>
>> >>> Anything else I can look at as to why it's not connecting?  Any
>> >> suggestions on why my host won't reconnect?
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>> ________________________________
>> >>>
>> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>> >>> and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>> >>>
>> >>>
>> >>>
>> >>> ________________________________
>> >>>
>> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
>> >>> and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >>
>> >> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> >> review of the individual or entity to which it is addressed and may
>> >> contain information that is privileged and confidential. If the
>> >> reader of this message is not the intended recipient, or the employee
>> >> or agent responsible for delivering the message solely to the
>> >> intended recipient, you are hereby notified that any dissemination,
>> >> distribution or copying of this communication is strictly prohibited.
>> >> If you have received this communication in error, please notify
>> >> sender immediately by telephone or return email. Thank you.
>

RE: Xenserver Host unable to reconnect

Posted by Devdeep Singh <de...@citrix.com>.

Hi Caleb,

Do you have any instances running on the host? If not, then removing it shouldn't cause any problems. Can you also check if xapi is running on the host.

Regards,
Devdeep

> -----Original Message-----
> From: Caleb Call [mailto:calebcall@me.com]
> Sent: Tuesday, February 12, 2013 5:37 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
> dev@incubator.apache.org
> Subject: Re: Xenserver Host unable to reconnect
> 
> Tried this but still nothing.  As I mentioned before, the mgmt_server_id is
> already NULL.
> 
> Here's a question for the group, if we pull this single host out of it's cluster,
> will it do anything to the VMs that area already in this zone?  We have had
> similar problems in our other zones that have multiple hosts, but we've been
> able to pull those hosts out, rebuild them and re-add them and then
> everything is happy.
> 
> Can we do the same for a single node cluster?
> 
> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
> 
> > Try this,
> >
> > - stop management server
> > - null out mgmt_server_id for hosts in host table
> > - start management server
> >
> >
> > Anthony
> >
> >> -----Original Message-----
> >> From: Caleb Call [mailto:ccall@overstock.com]
> >> Sent: Monday, February 11, 2013 2:48 PM
> >> To: <ae...@gmail.com>
> >> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
> >> users@incubator.apache.org
> >> Subject: Re: Xenserver Host unable to reconnect
> >>
> >> Yes, I canceled maintenance mode via Cloudstack.
> >>
> >> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
> >> <ae...@gmail.com>>
> >> wrote:
> >>
> >> is the host still in maintenance mode, have you cancelled maintenance
> >> via cloudstack?
> >>
> >>
> >> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
> >> <ca...@me.com>> wrote:
> >> No luck, the mgmt_server_id is already null.
> >>
> >>
> >> On Feb 11, 2013, at 3:27 PM, Caleb Call
> >> <cc...@overstock.com>> wrote:
> >>
> >>> Yes to both of those, I should have mentioned I have tried to make
> >> sure connectivity is still good.  I'll try nulling out the
> >> mgmt_server_id in the host table and see if that works.
> >>>
> >>> Thanks
> >>>
> >>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
> >>
> <ae...@gmail.com><mailto:aemneina@gm
> ail.
> >> co
> >> m<ma...@gmail.com>>>
> >>> wrote:
> >>>
> >>> from the management server, can you ssh to that host? can you
> >>> execute
> >> xe commands on that host? if yes to both those, null out the
> >> mgmt_server_id from your host in the host table... then issue the
> >> force reconnect. see if that helps.
> >>>
> >>>
> >>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
> >>
> <cc...@overstock.com><mailto:ccall@oversto
> >> ck .com<ma...@overstock.com>>> wrote:
> >>> We have a zone that has a single host in it.  We also recently
> >> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
> >> mention it anyways).  We put our host in maintenance mode (all VMs
> >> were shutdown, etc) and applied some patches that were waiting to be
> applied.
> >> After coming back up, it now is unable to reconnect, when I try to
> >> force reconnect, I get the following in the management log:
> >>>
> >>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-
> >> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> >>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> >> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> >> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
> >> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> >> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
> >>
> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
> G
> >> dB
> >> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
> >> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
> >> callbackAddress:
> >> null, status: 0, processStatus: 0, resultCode: 0, result: null,
> >> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
> >> lastPolled: null, created: null}
> >>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> >>> (Job-
> >> Executor-3:job-4806) Executing
> >> com.cloud.api.commands.ReconnectHostCmd
> >> for job-4806
> >>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-
> >> Executor-3:job-4806) Unable to disconnect host because it is not
> >> connected to this server: 25
> >>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-
> >> Executor-3:job-4806) Exception:
> >>> com.cloud.api.ServerApiException
> >>>       at
> >>
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
> a
> >> :1
> >> 08)
> >>>       at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >>>       at
> >>
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
> >> 2)
> >>>       at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
> >> 1)
> >>>       at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>       at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >> ja
> >> va:1110)
> >>>       at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >> .j
> >> ava:603)
> >>>       at java.lang.Thread.run(Thread.java:679)
> >>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
> >> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> >>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> >>> (Job-
> >> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
> >> 530, result: Error Code: 534 Error text: null
> >>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-
> >> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> >>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> >> (catalina-exec-17:null) Async job-4806 completed
> >>>
> >>>
> >>> I can't find in the logs where it's trying (besides the force
> >> reconnect) to reconnect on it's own.  I do see where it acknowledges
> >> the state of Alert for the host, but doesn't give any reasoning as to
> >> why.
> >>>
> >>> The only thing I can see any indication it's even trying is this line:
> >>>
> >>> 2013-02-11 11:47:05,670 DEBUG
> [xen.resource.XenServerConnectionPool]
> >> (ClusteredAgentManager Timer:null) Failed to slave local login to
> >> 10.5.1.14
> >>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> >> (ClusteredAgentManager Timer:null) Unable to configure resource due
> >> to Can not create slave connection to 10.5.1.14
> >>>
> >>> 10.5.1.14 is the host that should be reconnecting but is not.
> >>>
> >>> Anything else I can look at as to why it's not connecting?  Any
> >> suggestions on why my host won't reconnect?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
> >>> and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
> >>> and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.

RE: Xenserver Host unable to reconnect

Posted by Devdeep Singh <de...@citrix.com>.

Hi Caleb,

Do you have any instances running on the host? If not, then removing it shouldn't cause any problems. Can you also check if xapi is running on the host.

Regards,
Devdeep

> -----Original Message-----
> From: Caleb Call [mailto:calebcall@me.com]
> Sent: Tuesday, February 12, 2013 5:37 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Caleb Call'; <ae...@gmail.com>; cloudstack-
> dev@incubator.apache.org
> Subject: Re: Xenserver Host unable to reconnect
> 
> Tried this but still nothing.  As I mentioned before, the mgmt_server_id is
> already NULL.
> 
> Here's a question for the group, if we pull this single host out of it's cluster,
> will it do anything to the VMs that area already in this zone?  We have had
> similar problems in our other zones that have multiple hosts, but we've been
> able to pull those hosts out, rebuild them and re-add them and then
> everything is happy.
> 
> Can we do the same for a single node cluster?
> 
> On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:
> 
> > Try this,
> >
> > - stop management server
> > - null out mgmt_server_id for hosts in host table
> > - start management server
> >
> >
> > Anthony
> >
> >> -----Original Message-----
> >> From: Caleb Call [mailto:ccall@overstock.com]
> >> Sent: Monday, February 11, 2013 2:48 PM
> >> To: <ae...@gmail.com>
> >> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
> >> users@incubator.apache.org
> >> Subject: Re: Xenserver Host unable to reconnect
> >>
> >> Yes, I canceled maintenance mode via Cloudstack.
> >>
> >> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
> >> <ae...@gmail.com>>
> >> wrote:
> >>
> >> is the host still in maintenance mode, have you cancelled maintenance
> >> via cloudstack?
> >>
> >>
> >> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
> >> <ca...@me.com>> wrote:
> >> No luck, the mgmt_server_id is already null.
> >>
> >>
> >> On Feb 11, 2013, at 3:27 PM, Caleb Call
> >> <cc...@overstock.com>> wrote:
> >>
> >>> Yes to both of those, I should have mentioned I have tried to make
> >> sure connectivity is still good.  I'll try nulling out the
> >> mgmt_server_id in the host table and see if that works.
> >>>
> >>> Thanks
> >>>
> >>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
> >>
> <ae...@gmail.com><mailto:aemneina@gm
> ail.
> >> co
> >> m<ma...@gmail.com>>>
> >>> wrote:
> >>>
> >>> from the management server, can you ssh to that host? can you
> >>> execute
> >> xe commands on that host? if yes to both those, null out the
> >> mgmt_server_id from your host in the host table... then issue the
> >> force reconnect. see if that helps.
> >>>
> >>>
> >>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
> >>
> <cc...@overstock.com><mailto:ccall@oversto
> >> ck .com<ma...@overstock.com>>> wrote:
> >>> We have a zone that has a single host in it.  We also recently
> >> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
> >> mention it anyways).  We put our host in maintenance mode (all VMs
> >> were shutdown, etc) and applied some patches that were waiting to be
> applied.
> >> After coming back up, it now is unable to reconnect, when I try to
> >> force reconnect, I get the following in the management log:
> >>>
> >>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-
> >> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> >>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> >> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> >> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
> >> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> >> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
> >>
> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9
> G
> >> dB
> >> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ct
> >> xS tartEventId":"15461"}, cmdVersion: 0, callbackType: 0,
> >> callbackAddress:
> >> null, status: 0, processStatus: 0, resultCode: 0, result: null,
> >> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
> >> lastPolled: null, created: null}
> >>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> >>> (Job-
> >> Executor-3:job-4806) Executing
> >> com.cloud.api.commands.ReconnectHostCmd
> >> for job-4806
> >>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-
> >> Executor-3:job-4806) Unable to disconnect host because it is not
> >> connected to this server: 25
> >>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-
> >> Executor-3:job-4806) Exception:
> >>> com.cloud.api.ServerApiException
> >>>       at
> >>
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.jav
> a
> >> :1
> >> 08)
> >>>       at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >>>       at
> >>
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:43
> >> 2)
> >>>       at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
> >> 1)
> >>>       at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>       at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >> ja
> >> va:1110)
> >>>       at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >> .j
> >> ava:603)
> >>>       at java.lang.Thread.run(Thread.java:679)
> >>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
> >> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> >>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> >>> (Job-
> >> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
> >> 530, result: Error Code: 534 Error text: null
> >>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-
> >> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> >>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> >> (catalina-exec-17:null) Async job-4806 completed
> >>>
> >>>
> >>> I can't find in the logs where it's trying (besides the force
> >> reconnect) to reconnect on it's own.  I do see where it acknowledges
> >> the state of Alert for the host, but doesn't give any reasoning as to
> >> why.
> >>>
> >>> The only thing I can see any indication it's even trying is this line:
> >>>
> >>> 2013-02-11 11:47:05,670 DEBUG
> [xen.resource.XenServerConnectionPool]
> >> (ClusteredAgentManager Timer:null) Failed to slave local login to
> >> 10.5.1.14
> >>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> >> (ClusteredAgentManager Timer:null) Unable to configure resource due
> >> to Can not create slave connection to 10.5.1.14
> >>>
> >>> 10.5.1.14 is the host that should be reconnecting but is not.
> >>>
> >>> Anything else I can look at as to why it's not connecting?  Any
> >> suggestions on why my host won't reconnect?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
> >>> and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> CONFIDENTIALITY NOTICE: This message is intended only for the use
> >>> and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> >> review of the individual or entity to which it is addressed and may
> >> contain information that is privileged and confidential. If the
> >> reader of this message is not the intended recipient, or the employee
> >> or agent responsible for delivering the message solely to the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited.
> >> If you have received this communication in error, please notify
> >> sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

Tried this but still nothing.  As I mentioned before, the mgmt_server_id is already NULL.

Here's a question for the group, if we pull this single host out of it's cluster, will it do anything to the VMs that area already in this zone?  We have had similar problems in our other zones that have multiple hosts, but we've been able to pull those hosts out, rebuild them and re-add them and then everything is happy.

Can we do the same for a single node cluster?

On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:

> Try this,
> 
> - stop management server
> - null out mgmt_server_id for hosts in host table
> - start management server
> 
> 
> Anthony
> 
>> -----Original Message-----
>> From: Caleb Call [mailto:ccall@overstock.com]
>> Sent: Monday, February 11, 2013 2:48 PM
>> To: <ae...@gmail.com>
>> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>> users@incubator.apache.org
>> Subject: Re: Xenserver Host unable to reconnect
>> 
>> Yes, I canceled maintenance mode via Cloudstack.
>> 
>> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>> <ae...@gmail.com>>
>> wrote:
>> 
>> is the host still in maintenance mode, have you cancelled maintenance
>> via cloudstack?
>> 
>> 
>> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>> <ca...@me.com>> wrote:
>> No luck, the mgmt_server_id is already null.
>> 
>> 
>> On Feb 11, 2013, at 3:27 PM, Caleb Call
>> <cc...@overstock.com>> wrote:
>> 
>>> Yes to both of those, I should have mentioned I have tried to make
>> sure connectivity is still good.  I'll try nulling out the
>> mgmt_server_id in the host table and see if that works.
>>> 
>>> Thanks
>>> 
>>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>> <ae...@gmail.com><mailto:aemneina@gmail.co
>> m<ma...@gmail.com>>>
>>> wrote:
>>> 
>>> from the management server, can you ssh to that host? can you execute
>> xe commands on that host? if yes to both those, null out the
>> mgmt_server_id from your host in the host table... then issue the force
>> reconnect. see if that helps.
>>> 
>>> 
>>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>> <cc...@overstock.com><mailto:ccall@overstock
>> .com<ma...@overstock.com>>> wrote:
>>> We have a zone that has a single host in it.  We also recently
>> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>> mention it anyways).  We put our host in maintenance mode (all VMs were
>> shutdown, etc) and applied some patches that were waiting to be applied.
>> After coming back up, it now is unable to reconnect, when I try to
>> force reconnect, I get the following in the management log:
>>> 
>>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-
>> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB
>> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxS
>> tartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress:
>> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>> lastPolled: null, created: null}
>>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
>> Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
>> for job-4806
>>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-
>> Executor-3:job-4806) Unable to disconnect host because it is not
>> connected to this server: 25
>>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-
>> Executor-3:job-4806) Exception:
>>> com.cloud.api.ServerApiException
>>>       at
>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:1
>> 08)
>>>       at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>>>       at
>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>>>       at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>       at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>> va:1110)
>>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>> ava:603)
>>>       at java.lang.Thread.run(Thread.java:679)
>>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
>> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
>> 530, result: Error Code: 534 Error text: null
>>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-
>> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>> (catalina-exec-17:null) Async job-4806 completed
>>> 
>>> 
>>> I can't find in the logs where it's trying (besides the force
>> reconnect) to reconnect on it's own.  I do see where it acknowledges
>> the state of Alert for the host, but doesn't give any reasoning as to
>> why.
>>> 
>>> The only thing I can see any indication it's even trying is this line:
>>> 
>>> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
>> (ClusteredAgentManager Timer:null) Failed to slave local login to
>> 10.5.1.14
>>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>> (ClusteredAgentManager Timer:null) Unable to configure resource due to
>> Can not create slave connection to 10.5.1.14
>>> 
>>> 10.5.1.14 is the host that should be reconnecting but is not.
>>> 
>>> Anything else I can look at as to why it's not connecting?  Any
>> suggestions on why my host won't reconnect?
>>> 
>>> Thanks
>>> 
>>> 
>>> ________________________________
>>> 
>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.
>> 
>> 
>> 
>> 
>> ________________________________
>> 
>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

Tried this but still nothing.  As I mentioned before, the mgmt_server_id is already NULL.

Here's a question for the group, if we pull this single host out of it's cluster, will it do anything to the VMs that area already in this zone?  We have had similar problems in our other zones that have multiple hosts, but we've been able to pull those hosts out, rebuild them and re-add them and then everything is happy.

Can we do the same for a single node cluster?

On Feb 11, 2013, at 3:54 PM, Anthony Xu <Xu...@citrix.com> wrote:

> Try this,
> 
> - stop management server
> - null out mgmt_server_id for hosts in host table
> - start management server
> 
> 
> Anthony
> 
>> -----Original Message-----
>> From: Caleb Call [mailto:ccall@overstock.com]
>> Sent: Monday, February 11, 2013 2:48 PM
>> To: <ae...@gmail.com>
>> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
>> users@incubator.apache.org
>> Subject: Re: Xenserver Host unable to reconnect
>> 
>> Yes, I canceled maintenance mode via Cloudstack.
>> 
>> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
>> <ae...@gmail.com>>
>> wrote:
>> 
>> is the host still in maintenance mode, have you cancelled maintenance
>> via cloudstack?
>> 
>> 
>> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
>> <ca...@me.com>> wrote:
>> No luck, the mgmt_server_id is already null.
>> 
>> 
>> On Feb 11, 2013, at 3:27 PM, Caleb Call
>> <cc...@overstock.com>> wrote:
>> 
>>> Yes to both of those, I should have mentioned I have tried to make
>> sure connectivity is still good.  I'll try nulling out the
>> mgmt_server_id in the host table and see if that works.
>>> 
>>> Thanks
>>> 
>>> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
>> <ae...@gmail.com><mailto:aemneina@gmail.co
>> m<ma...@gmail.com>>>
>>> wrote:
>>> 
>>> from the management server, can you ssh to that host? can you execute
>> xe commands on that host? if yes to both those, null out the
>> mgmt_server_id from your host in the host table... then issue the force
>> reconnect. see if that helps.
>>> 
>>> 
>>> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
>> <cc...@overstock.com><mailto:ccall@overstock
>> .com<ma...@overstock.com>>> wrote:
>>> We have a zone that has a single host in it.  We also recently
>> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
>> mention it anyways).  We put our host in maintenance mode (all VMs were
>> shutdown, etc) and applied some patches that were waiting to be applied.
>> After coming back up, it now is unable to reconnect, when I try to
>> force reconnect, I get the following in the management log:
>>> 
>>> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-
>> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
>> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
>> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
>> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
>> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
>> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB
>> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxS
>> tartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress:
>> null, status: 0, processStatus: 0, resultCode: 0, result: null,
>> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
>> lastPolled: null, created: null}
>>> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
>> Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
>> for job-4806
>>> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-
>> Executor-3:job-4806) Unable to disconnect host because it is not
>> connected to this server: 25
>>> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-
>> Executor-3:job-4806) Exception:
>>> com.cloud.api.ServerApiException
>>>       at
>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:1
>> 08)
>>>       at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>>>       at
>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>>>       at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>       at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>> va:1110)
>>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>> ava:603)
>>>       at java.lang.Thread.run(Thread.java:679)
>>> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
>> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
>>> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
>> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
>> 530, result: Error Code: 534 Error text: null
>>> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-
>> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
>>> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
>> (catalina-exec-17:null) Async job-4806 completed
>>> 
>>> 
>>> I can't find in the logs where it's trying (besides the force
>> reconnect) to reconnect on it's own.  I do see where it acknowledges
>> the state of Alert for the host, but doesn't give any reasoning as to
>> why.
>>> 
>>> The only thing I can see any indication it's even trying is this line:
>>> 
>>> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
>> (ClusteredAgentManager Timer:null) Failed to slave local login to
>> 10.5.1.14
>>> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
>> (ClusteredAgentManager Timer:null) Unable to configure resource due to
>> Can not create slave connection to 10.5.1.14
>>> 
>>> 10.5.1.14 is the host that should be reconnecting but is not.
>>> 
>>> Anything else I can look at as to why it's not connecting?  Any
>> suggestions on why my host won't reconnect?
>>> 
>>> Thanks
>>> 
>>> 
>>> ________________________________
>>> 
>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.
>> 
>> 
>> 
>> 
>> ________________________________
>> 
>> CONFIDENTIALITY NOTICE: This message is intended only for the use and
>> review of the individual or entity to which it is addressed and may
>> contain information that is privileged and confidential. If the reader
>> of this message is not the intended recipient, or the employee or agent
>> responsible for delivering the message solely to the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying
>> of this communication is strictly prohibited. If you have received this
>> communication in error, please notify sender immediately by telephone
>> or return email. Thank you.

RE: Xenserver Host unable to reconnect

Posted by Anthony Xu <Xu...@citrix.com>.

Try this,

- stop management server
- null out mgmt_server_id for hosts in host table
- start management server


Anthony

> -----Original Message-----
> From: Caleb Call [mailto:ccall@overstock.com]
> Sent: Monday, February 11, 2013 2:48 PM
> To: <ae...@gmail.com>
> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
> users@incubator.apache.org
> Subject: Re: Xenserver Host unable to reconnect
> 
> Yes, I canceled maintenance mode via Cloudstack.
> 
> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
> <ae...@gmail.com>>
>  wrote:
> 
> is the host still in maintenance mode, have you cancelled maintenance
> via cloudstack?
> 
> 
> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
> <ca...@me.com>> wrote:
> No luck, the mgmt_server_id is already null.
> 
> 
> On Feb 11, 2013, at 3:27 PM, Caleb Call
> <cc...@overstock.com>> wrote:
> 
> > Yes to both of those, I should have mentioned I have tried to make
> sure connectivity is still good.  I'll try nulling out the
> mgmt_server_id in the host table and see if that works.
> >
> > Thanks
> >
> > On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
> <ae...@gmail.com><mailto:aemneina@gmail.co
> m<ma...@gmail.com>>>
> > wrote:
> >
> > from the management server, can you ssh to that host? can you execute
> xe commands on that host? if yes to both those, null out the
> mgmt_server_id from your host in the host table... then issue the force
> reconnect. see if that helps.
> >
> >
> > On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
> <cc...@overstock.com><mailto:ccall@overstock
> .com<ma...@overstock.com>>> wrote:
> > We have a zone that has a single host in it.  We also recently
> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
> mention it anyways).  We put our host in maintenance mode (all VMs were
> shutdown, etc) and applied some patches that were waiting to be applied.
> After coming back up, it now is unable to reconnect, when I try to
> force reconnect, I get the following in the management log:
> >
> > 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-
> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB
> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxS
> tartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress:
> null, status: 0, processStatus: 0, resultCode: 0, result: null,
> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
> lastPolled: null, created: null}
> > 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
> Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> > 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-
> Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> > 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-
> Executor-3:job-4806) Exception:
> > com.cloud.api.ServerApiException
> >        at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:1
> 08)
> >        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >        at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
> >        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
> va:1110)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:603)
> >        at java.lang.Thread.run(Thread.java:679)
> > 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> > 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
> 530, result: Error Code: 534 Error text: null
> > 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-
> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
> >
> >
> > I can't find in the logs where it's trying (besides the force
> reconnect) to reconnect on it's own.  I do see where it acknowledges
> the state of Alert for the host, but doesn't give any reasoning as to
> why.
> >
> > The only thing I can see any indication it's even trying is this line:
> >
> > 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to
> 10.5.1.14
> > 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to
> Can not create slave connection to 10.5.1.14
> >
> > 10.5.1.14 is the host that should be reconnecting but is not.
> >
> > Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
> >
> > Thanks
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.
> >
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.
> 
> 
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.

RE: Xenserver Host unable to reconnect

Posted by Anthony Xu <Xu...@citrix.com>.

Try this,

- stop management server
- null out mgmt_server_id for hosts in host table
- start management server


Anthony

> -----Original Message-----
> From: Caleb Call [mailto:ccall@overstock.com]
> Sent: Monday, February 11, 2013 2:48 PM
> To: <ae...@gmail.com>
> Cc: Caleb Call; cloudstack-dev@incubator.apache.org; cloudstack-
> users@incubator.apache.org
> Subject: Re: Xenserver Host unable to reconnect
> 
> Yes, I canceled maintenance mode via Cloudstack.
> 
> On Feb 11, 2013, at 3:46 PM, Ahmad Emneina
> <ae...@gmail.com>>
>  wrote:
> 
> is the host still in maintenance mode, have you cancelled maintenance
> via cloudstack?
> 
> 
> On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call
> <ca...@me.com>> wrote:
> No luck, the mgmt_server_id is already null.
> 
> 
> On Feb 11, 2013, at 3:27 PM, Caleb Call
> <cc...@overstock.com>> wrote:
> 
> > Yes to both of those, I should have mentioned I have tried to make
> sure connectivity is still good.  I'll try nulling out the
> mgmt_server_id in the host table and see if that works.
> >
> > Thanks
> >
> > On Feb 11, 2013, at 3:23 PM, Ahmad Emneina
> <ae...@gmail.com><mailto:aemneina@gmail.co
> m<ma...@gmail.com>>>
> > wrote:
> >
> > from the management server, can you ssh to that host? can you execute
> xe commands on that host? if yes to both those, null out the
> mgmt_server_id from your host in the host table... then issue the force
> reconnect. see if that helps.
> >
> >
> > On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call
> <cc...@overstock.com><mailto:ccall@overstock
> .com<ma...@overstock.com>>> wrote:
> > We have a zone that has a single host in it.  We also recently
> updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd
> mention it anyways).  We put our host in maintenance mode (all VMs were
> shutdown, etc) and applied some patches that were waiting to be applied.
> After coming back up, it now is unable to reconnect, when I try to
> force reconnect, I get the following in the management log:
> >
> > 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-
> exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType:
> Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-
> 46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB
> 5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxS
> tartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress:
> null, status: 0, processStatus: 0, resultCode: 0, result: null,
> initMsid: 145320940120008, completeMsid: null, lastUpdated: null,
> lastPolled: null, created: null}
> > 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
> Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> > 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-
> Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> > 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-
> Executor-3:job-4806) Exception:
> > com.cloud.api.ServerApiException
> >        at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:1
> 08)
> >        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >        at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
> >        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
> va:1110)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:603)
> >        at java.lang.Thread.run(Thread.java:679)
> > 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-
> Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> > 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-
> Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode:
> 530, result: Error Code: 534 Error text: null
> > 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-
> exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
> >
> >
> > I can't find in the logs where it's trying (besides the force
> reconnect) to reconnect on it's own.  I do see where it acknowledges
> the state of Alert for the host, but doesn't give any reasoning as to
> why.
> >
> > The only thing I can see any indication it's even trying is this line:
> >
> > 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to
> 10.5.1.14
> > 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to
> Can not create slave connection to 10.5.1.14
> >
> > 10.5.1.14 is the host that should be reconnecting but is not.
> >
> > Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
> >
> > Thanks
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.
> >
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.
> 
> 
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may
> contain information that is privileged and confidential. If the reader
> of this message is not the intended recipient, or the employee or agent
> responsible for delivering the message solely to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone
> or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <cc...@overstock.com>.

Yes, I canceled maintenance mode via Cloudstack.

On Feb 11, 2013, at 3:46 PM, Ahmad Emneina <ae...@gmail.com>>
 wrote:

is the host still in maintenance mode, have you cancelled maintenance via cloudstack?


On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call <ca...@me.com>> wrote:
No luck, the mgmt_server_id is already null.


On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com>> wrote:

> Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.
>
> Thanks
>
> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>>
> wrote:
>
> from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.
>
>
> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>>> wrote:
> We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:
>
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed
>
>
> I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.
>
> The only thing I can see any indication it's even trying is this line:
>
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14
>
> 10.5.1.14 is the host that should be reconnecting but is not.
>
> Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?
>
> Thanks
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
>
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.




________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <cc...@overstock.com>.

Yes, I canceled maintenance mode via Cloudstack.

On Feb 11, 2013, at 3:46 PM, Ahmad Emneina <ae...@gmail.com>>
 wrote:

is the host still in maintenance mode, have you cancelled maintenance via cloudstack?


On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call <ca...@me.com>> wrote:
No luck, the mgmt_server_id is already null.


On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com>> wrote:

> Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.
>
> Thanks
>
> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>>
> wrote:
>
> from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.
>
>
> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>>> wrote:
> We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:
>
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed
>
>
> I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.
>
> The only thing I can see any indication it's even trying is this line:
>
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14
>
> 10.5.1.14 is the host that should be reconnecting but is not.
>
> Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?
>
> Thanks
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
>
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.




________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Ahmad Emneina <ae...@gmail.com>.

is the host still in maintenance mode, have you cancelled maintenance via
cloudstack?


On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call <ca...@me.com> wrote:

> No luck, the mgmt_server_id is already null.
>
>
> On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com> wrote:
>
> > Yes to both of those, I should have mentioned I have tried to make sure
> connectivity is still good.  I'll try nulling out the mgmt_server_id in the
> host table and see if that works.
> >
> > Thanks
> >
> > On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <aemneina@gmail.com<mailto:
> aemneina@gmail.com>>
> > wrote:
> >
> > from the management server, can you ssh to that host? can you execute xe
> commands on that host? if yes to both those, null out the mgmt_server_id
> from your host in the host table... then issue the force reconnect. see if
> that helps.
> >
> >
> > On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <ccall@overstock.com<mailto:
> ccall@overstock.com>> wrote:
> > We have a zone that has a single host in it.  We also recently updated
> to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it
> anyways).  We put our host in maintenance mode (all VMs were shutdown, etc)
> and applied some patches that were waiting to be applied.  After coming
> back up, it now is unable to reconnect, when I try to force reconnect, I
> get the following in the management log:
> >
> > 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host,
> instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo:
> {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"},
> cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
> processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008,
> completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> > 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> > 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> > 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-Executor-3:job-4806) Exception:
> > com.cloud.api.ServerApiException
> >        at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
> >        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >        at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
> >        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >        at java.lang.Thread.run(Thread.java:679)
> > 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher]
> (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> > 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
> resultCode: 530, result: Error Code: 534 Error text: null
> > 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
> >
> >
> > I can't find in the logs where it's trying (besides the force reconnect)
> to reconnect on it's own.  I do see where it acknowledges the state of
> Alert for the host, but doesn't give any reasoning as to why.
> >
> > The only thing I can see any indication it's even trying is this line:
> >
> > 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> > 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to Can
> not create slave connection to 10.5.1.14
> >
> > 10.5.1.14 is the host that should be reconnecting but is not.
> >
> > Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
> >
> > Thanks
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
> >
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
>
>

Re: Xenserver Host unable to reconnect

Posted by Ahmad Emneina <ae...@gmail.com>.

is the host still in maintenance mode, have you cancelled maintenance via
cloudstack?


On Mon, Feb 11, 2013 at 2:32 PM, Caleb Call <ca...@me.com> wrote:

> No luck, the mgmt_server_id is already null.
>
>
> On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com> wrote:
>
> > Yes to both of those, I should have mentioned I have tried to make sure
> connectivity is still good.  I'll try nulling out the mgmt_server_id in the
> host table and see if that works.
> >
> > Thanks
> >
> > On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <aemneina@gmail.com<mailto:
> aemneina@gmail.com>>
> > wrote:
> >
> > from the management server, can you ssh to that host? can you execute xe
> commands on that host? if yes to both those, null out the mgmt_server_id
> from your host in the host table... then issue the force reconnect. see if
> that helps.
> >
> >
> > On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <ccall@overstock.com<mailto:
> ccall@overstock.com>> wrote:
> > We have a zone that has a single host in it.  We also recently updated
> to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it
> anyways).  We put our host in maintenance mode (all VMs were shutdown, etc)
> and applied some patches that were waiting to be applied.  After coming
> back up, it now is unable to reconnect, when I try to force reconnect, I
> get the following in the management log:
> >
> > 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host,
> instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo:
> {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"},
> cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
> processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008,
> completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> > 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> > 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> > 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-Executor-3:job-4806) Exception:
> > com.cloud.api.ServerApiException
> >        at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
> >        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
> >        at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
> >        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >        at java.lang.Thread.run(Thread.java:679)
> > 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher]
> (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> > 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
> resultCode: 530, result: Error Code: 534 Error text: null
> > 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> > 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
> >
> >
> > I can't find in the logs where it's trying (besides the force reconnect)
> to reconnect on it's own.  I do see where it acknowledges the state of
> Alert for the host, but doesn't give any reasoning as to why.
> >
> > The only thing I can see any indication it's even trying is this line:
> >
> > 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> > 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to Can
> not create slave connection to 10.5.1.14
> >
> > 10.5.1.14 is the host that should be reconnecting but is not.
> >
> > Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
> >
> > Thanks
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
> >
> >
> >
> > ________________________________
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
>
>

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

No luck, the mgmt_server_id is already null.  


On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com> wrote:

> Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.
> 
> Thanks
> 
> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>
> wrote:
> 
> from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.
> 
> 
> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>> wrote:
> We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:
> 
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed
> 
> 
> I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.
> 
> The only thing I can see any indication it's even trying is this line:
> 
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14
> 
> 10.5.1.14 is the host that should be reconnecting but is not.
> 
> Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?
> 
> Thanks
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
> 
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <ca...@me.com>.

No luck, the mgmt_server_id is already null.  


On Feb 11, 2013, at 3:27 PM, Caleb Call <cc...@overstock.com> wrote:

> Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.
> 
> Thanks
> 
> On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>
> wrote:
> 
> from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.
> 
> 
> On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>> wrote:
> We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:
> 
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed
> 
> 
> I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.
> 
> The only thing I can see any indication it's even trying is this line:
> 
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14
> 
> 10.5.1.14 is the host that should be reconnecting but is not.
> 
> Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?
> 
> Thanks
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
> 
> 
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <cc...@overstock.com>.

Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.

Thanks

On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>
 wrote:

from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.


On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>> wrote:
We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:

2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
com.cloud.api.ServerApiException
        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed


I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.

The only thing I can see any indication it's even trying is this line:

2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14

10.5.1.14 is the host that should be reconnecting but is not.

Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?

Thanks


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.



________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Caleb Call <cc...@overstock.com>.

Yes to both of those, I should have mentioned I have tried to make sure connectivity is still good.  I'll try nulling out the mgmt_server_id in the host table and see if that works.

Thanks

On Feb 11, 2013, at 3:23 PM, Ahmad Emneina <ae...@gmail.com>>
 wrote:

from the management server, can you ssh to that host? can you execute xe commands on that host? if yes to both those, null out the mgmt_server_id from your host in the host table... then issue the force reconnect. see if that helps.


On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com>> wrote:
We have a zone that has a single host in it.  We also recently updated to 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We put our host in maintenance mode (all VMs were shutdown, etc) and applied some patches that were waiting to be applied.  After coming back up, it now is unable to reconnect, when I try to force reconnect, I get the following in the management log:

2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for job-4806
2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] (Job-Executor-3:job-4806) Unable to disconnect host because it is not connected to this server: 25
2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] (Job-Executor-3:job-4806) Exception:
com.cloud.api.ServerApiException
        at com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
        at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 530, result: Error Code: 534 Error text: null
2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-17:null) Async job-4806 completed


I can't find in the logs where it's trying (besides the force reconnect) to reconnect on it's own.  I do see where it acknowledges the state of Alert for the host, but doesn't give any reasoning as to why.

The only thing I can see any indication it's even trying is this line:

2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] (ClusteredAgentManager Timer:null) Unable to configure resource due to Can not create slave connection to 10.5.1.14

10.5.1.14 is the host that should be reconnecting but is not.

Anything else I can look at as to why it's not connecting?  Any suggestions on why my host won't reconnect?

Thanks


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.



________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Re: Xenserver Host unable to reconnect

Posted by Ahmad Emneina <ae...@gmail.com>.

from the management server, can you ssh to that host? can you execute xe
commands on that host? if yes to both those, null out the mgmt_server_id
from your host in the host table... then issue the force reconnect. see if
that helps.


On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com> wrote:

> We have a zone that has a single host in it.  We also recently updated to
> 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it
> anyways).  We put our host in maintenance mode (all VMs were shutdown, etc)
> and applied some patches that were waiting to be applied.  After coming
> back up, it now is unable to reconnect, when I try to force reconnect, I
> get the following in the management log:
>
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host,
> instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo:
> {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"},
> cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
> processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008,
> completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>         at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>         at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher]
> (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
> resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
>
>
> I can't find in the logs where it's trying (besides the force reconnect)
> to reconnect on it's own.  I do see where it acknowledges the state of
> Alert for the host, but doesn't give any reasoning as to why.
>
> The only thing I can see any indication it's even trying is this line:
>
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to Can
> not create slave connection to 10.5.1.14
>
> 10.5.1.14 is the host that should be reconnecting but is not.
>
> Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
>
> Thanks
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
>

Re: Xenserver Host unable to reconnect

Posted by Ahmad Emneina <ae...@gmail.com>.

from the management server, can you ssh to that host? can you execute xe
commands on that host? if yes to both those, null out the mgmt_server_id
from your host in the host table... then issue the force reconnect. see if
that helps.


On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call <cc...@overstock.com> wrote:

> We have a zone that has a single host in it.  We also recently updated to
> 4.0 from 3.0.2 (this may not be relevant but figured I'd mention it
> anyways).  We put our host in maintenance mode (all VMs were shutdown, etc)
> and applied some patches that were waiting to be applied.  After coming
> back up, it now is unable to reconnect, when I try to force reconnect, I
> get the following in the management log:
>
> 2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-19:null) submit async job-4806, details: AsyncJobVO
> {id:4806, userId: 10, accountId: 7, sessionKey: null, instanceType: Host,
> instanceId: 25, cmd: com.cloud.api.commands.ReconnectHostCmd,
> cmdOriginator: null, cmdInfo:
> {"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"},
> cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
> processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008,
> completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
> 2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd
> for job-4806
> 2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl]
> (Job-Executor-3:job-4806) Unable to disconnect host because it is not
> connected to this server: 25
> 2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd]
> (Job-Executor-3:job-4806) Exception:
> com.cloud.api.ServerApiException
>         at
> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
>         at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher]
> (Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
> 2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2,
> resultCode: 530, result: Error Code: 534 Error text: null
> 2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore]
> (catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
> 2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl]
> (catalina-exec-17:null) Async job-4806 completed
>
>
> I can't find in the logs where it's trying (besides the force reconnect)
> to reconnect on it's own.  I do see where it acknowledges the state of
> Alert for the host, but doesn't give any reasoning as to why.
>
> The only thing I can see any indication it's even trying is this line:
>
> 2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool]
> (ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
> 2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase]
> (ClusteredAgentManager Timer:null) Unable to configure resource due to Can
> not create slave connection to 10.5.1.14
>
> 10.5.1.14 is the host that should be reconnecting but is not.
>
> Anything else I can look at as to why it's not connecting?  Any
> suggestions on why my host won't reconnect?
>
> Thanks
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and
> review of the individual or entity to which it is addressed and may contain
> information that is privileged and confidential. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message solely to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify sender immediately by telephone or
> return email. Thank you.
>