You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Carlos Reátegui <cr...@gmail.com> on 2014/07/25 22:43:12 UTC

Re: Change IP subnet of cluster [SOLVED]

My system is back up and running. 

As I suspected in my second email the problem was related to the msid in the mshost table.  Upon bringing up my system a new mshost entry was being created for the same MS and for some reason it was unable to connect to my XenServer hosts.

I decided to go back to my edited sql with the new IPs and change the existing mshost entry to have the new msid value:

sed -i.bak4 's/159090355471823/159090355471825/g' cloudstack_cloud-newips.sql

I did a global replace since there are foreign key constraints on the msid that I saw in the host and async_job tables

After reloading this new sql and starting the MS everything is back to normal, but with a new subnet for my hosts and guests (this is a basic network).


Question for the Devs:

Lets say the machine my MS was running on crashed and I replaced it with a new machine.  Since the msid is derived from the MAC wouldn’t I have encountered this same problem and not been able to have the new machine connect to the hosts?  

thanks,
Carlos




On Jul 25, 2014, at 8:46 AM, Carlos Reátegui <cr...@gmail.com> wrote:

> Any thoughts out there?
> 
> It keeps trying to connect to the hosts but it is unable to and there are no clues in the logs as to why.  I am successfully connected with XenCenter to the pool and also am able to ssh to all the hosts from the MS.
> 
> What does “Disable Cluster” or “Unmanage Cluster” do?  Should I try that and re-enable/manage?
> 
> From the UI, things appear ok but starting any instance fails.
> 
> Thanks,
> Carlos
> 
> Log snippet form this am:
> 
> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 2-65931749: Forwarding null to 159090355471823
> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Routing from 159090355471825
> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Link is closed
> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2-65931749: MgmtId 159090355471825: Req: Resource [Host:2] is
>  unreachable: Host 2: Link is closed
> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2--1: MgmtId 159090355471825: Req: Routing to peer
> 2014-07-25 21:03:19,601 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-14:null) Seq 2--1: MgmtId 159090355471825: Req: Cancel request received
> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-14:null) Seq 2-65931749: Cancelling.
> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
> 2014-07-25 21:03:19,601 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in
>  error code list for exceptions
> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Timed out on null
> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Cancelling.
> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 65931749 to Host 2 timed out after 3600
> 2014-07-25 21:03:19,601 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 2 statistics. 
> 2014-07-25 21:03:19,601 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 2
> 2014-07-25 21:03:19,606 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 3-602278373: Forwarding null to 159090355471823
> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Routing from 159090355471825
> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Link is closed
> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3-602278373: MgmtId 159090355471825: Req: Resource [Host:3] is unreachable: Host 3: Link is closed
> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3--1: MgmtId 159090355471825: Req: Routing to peer
> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-4:null) Seq 3--1: MgmtId 159090355471825: Req: Cancel request received
> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-4:null) Seq 3-602278373: Cancelling.
> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
> 2014-07-25 21:03:19,609 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Timed out on null
> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Cancelling.
> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 602278373 to Host 3 timed out after 3600
> 2014-07-25 21:03:19,609 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 3 statistics. 
> 2014-07-25 21:03:19,609 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 3
> 2014-07-25 21:03:19,614 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 5-1311574501: Forwarding null to 159090355471823
> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Routing from 159090355471825
> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Link is closed
> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-1:null) Seq 5-1311574501: MgmtId 159090355471825: Req: Resource [Host:5] is unreachable: Host 5: Link is closed
> 
> 
> 
> On Jul 24, 2014, at 10:59 PM, Carlos Reátegui <ca...@reategui.com> wrote:
> 
>> Not sure if it is related but I see 2 entries in the mshost for my same server but with different msid.  Both show as ‘Up’.  In reading the table comments it seems the msid is based on the MAC.  I am guessing this may be due to using a bond and that it may be have selected a different NIC to get the bond MAC from.  Is it ok to have both of these entries?  Should I mark the old one as Down?
>> 
>> Along these lines is there something similar with the hosts and that is why the MS is having problems connecting to them, ie. the MACs don’t match?
>> 
>> thanks,
>> Carlos
>> 
>> 
>> On Jul 24, 2014, at 3:35 PM, Carlos Reategui <ca...@reategui.com> wrote:
>> 
>>> Hi All,
>>> 
>>> Had to move one of my clusters to a new subnet but it is not working (e.g. 192.168.1.0/24 to 10.100.1.0/24).  These are the steps I took:
>>> 
>>> Environment: CS 4.1.1 on Ubuntu 12.04, XenServer 6.1, Shared NFS SR.
>>> 
>>> 1) stopped all instances using cloudstack UI
>>> 2) stop cloudstack-management service on MS
>>> 3) Used XenCenter to kill the system VMs (no other instances running)
>>> 4) Created backup of cloud db.
>>> 5) Followed http://support.citrix.com/article/CTX123477 and successfully changed the IP of hosts.  According to XenCenter everything is good including SR.
>>> 6) Changed IP of MS
>>> 7) verified communication between MS and Hosts using ssh and ping with new IPs.
>>> 8) used sed to search and replace all old IPs with new IPs in cloud backup sql file (e.g. sed -i.bak 's/192.168.1./10.100.1./g' clouddb.sql).
>>> 9) visually verified all diffs in the sql file and made sure no references to 192.168 left.
>>> 10) loaded up new sql
>>> 11) search all files under /etc on MS for old IP. found and edited: /etc/cloudstack/management/db.properties
>>> 12) start cloudstack-management service on MS
>>> 
>>> Unfortunately things are not working.  The MS is apparently unable to connect to the hosts but I can not figure out why from the logs.
>>> 
>>> Logs here: https://www.dropbox.com/s/s5glxrbyatmsoug/management-server.log
>>> 
>>> Any help recovering is appreciated.  I do not want to have to re-install and create/import template for each of the instance VHDs.
>>> 
>>> thank you,
>>> -Carlos
>> 
>

Re: Change IP subnet of cluster [SOLVED]

Posted by Daan Hoogland <da...@gmail.com>.

H Carlos, glad you figured it out. A colleague had a similar issue but
his finding was that the host table included a timestamp to identify
the management server.

You are right about the replacement of the management server whether
it is mac or timestamp, this will pose a problem.

On Fri, Jul 25, 2014 at 10:43 PM, Carlos Reátegui <cr...@gmail.com> wrote:
> My system is back up and running.
>
> As I suspected in my second email the problem was related to the msid in the mshost table.  Upon bringing up my system a new mshost entry was being created for the same MS and for some reason it was unable to connect to my XenServer hosts.
>
> I decided to go back to my edited sql with the new IPs and change the existing mshost entry to have the new msid value:
>
> sed -i.bak4 's/159090355471823/159090355471825/g' cloudstack_cloud-newips.sql
>
> I did a global replace since there are foreign key constraints on the msid that I saw in the host and async_job tables
>
> After reloading this new sql and starting the MS everything is back to normal, but with a new subnet for my hosts and guests (this is a basic network).
>
>
> Question for the Devs:
>
> Lets say the machine my MS was running on crashed and I replaced it with a new machine.  Since the msid is derived from the MAC wouldn’t I have encountered this same problem and not been able to have the new machine connect to the hosts?
>
> thanks,
> Carlos
>
>
>
>
> On Jul 25, 2014, at 8:46 AM, Carlos Reátegui <cr...@gmail.com> wrote:
>
>> Any thoughts out there?
>>
>> It keeps trying to connect to the hosts but it is unable to and there are no clues in the logs as to why.  I am successfully connected with XenCenter to the pool and also am able to ssh to all the hosts from the MS.
>>
>> What does “Disable Cluster” or “Unmanage Cluster” do?  Should I try that and re-enable/manage?
>>
>> From the UI, things appear ok but starting any instance fails.
>>
>> Thanks,
>> Carlos
>>
>> Log snippet form this am:
>>
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 2-65931749: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Routing from 159090355471825
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Link is closed
>> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2-65931749: MgmtId 159090355471825: Req: Resource [Host:2] is
>>  unreachable: Host 2: Link is closed
>> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2--1: MgmtId 159090355471825: Req: Routing to peer
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-14:null) Seq 2--1: MgmtId 159090355471825: Req: Cancel request received
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-14:null) Seq 2-65931749: Cancelling.
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,601 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in
>>  error code list for exceptions
>> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Timed out on null
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Cancelling.
>> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 65931749 to Host 2 timed out after 3600
>> 2014-07-25 21:03:19,601 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 2 statistics.
>> 2014-07-25 21:03:19,601 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 2
>> 2014-07-25 21:03:19,606 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 3-602278373: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Routing from 159090355471825
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Link is closed
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3-602278373: MgmtId 159090355471825: Req: Resource [Host:3] is unreachable: Host 3: Link is closed
>> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3--1: MgmtId 159090355471825: Req: Routing to peer
>> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-4:null) Seq 3--1: MgmtId 159090355471825: Req: Cancel request received
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-4:null) Seq 3-602278373: Cancelling.
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,609 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
>> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Timed out on null
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Cancelling.
>> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 602278373 to Host 3 timed out after 3600
>> 2014-07-25 21:03:19,609 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 3 statistics.
>> 2014-07-25 21:03:19,609 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 3
>> 2014-07-25 21:03:19,614 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 5-1311574501: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Routing from 159090355471825
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Link is closed
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-1:null) Seq 5-1311574501: MgmtId 159090355471825: Req: Resource [Host:5] is unreachable: Host 5: Link is closed
>>
>>
>>
>> On Jul 24, 2014, at 10:59 PM, Carlos Reátegui <ca...@reategui.com> wrote:
>>
>>> Not sure if it is related but I see 2 entries in the mshost for my same server but with different msid.  Both show as ‘Up’.  In reading the table comments it seems the msid is based on the MAC.  I am guessing this may be due to using a bond and that it may be have selected a different NIC to get the bond MAC from.  Is it ok to have both of these entries?  Should I mark the old one as Down?
>>>
>>> Along these lines is there something similar with the hosts and that is why the MS is having problems connecting to them, ie. the MACs don’t match?
>>>
>>> thanks,
>>> Carlos
>>>
>>>
>>> On Jul 24, 2014, at 3:35 PM, Carlos Reategui <ca...@reategui.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Had to move one of my clusters to a new subnet but it is not working (e.g. 192.168.1.0/24 to 10.100.1.0/24).  These are the steps I took:
>>>>
>>>> Environment: CS 4.1.1 on Ubuntu 12.04, XenServer 6.1, Shared NFS SR.
>>>>
>>>> 1) stopped all instances using cloudstack UI
>>>> 2) stop cloudstack-management service on MS
>>>> 3) Used XenCenter to kill the system VMs (no other instances running)
>>>> 4) Created backup of cloud db.
>>>> 5) Followed http://support.citrix.com/article/CTX123477 and successfully changed the IP of hosts.  According to XenCenter everything is good including SR.
>>>> 6) Changed IP of MS
>>>> 7) verified communication between MS and Hosts using ssh and ping with new IPs.
>>>> 8) used sed to search and replace all old IPs with new IPs in cloud backup sql file (e.g. sed -i.bak 's/192.168.1./10.100.1./g' clouddb.sql).
>>>> 9) visually verified all diffs in the sql file and made sure no references to 192.168 left.
>>>> 10) loaded up new sql
>>>> 11) search all files under /etc on MS for old IP. found and edited: /etc/cloudstack/management/db.properties
>>>> 12) start cloudstack-management service on MS
>>>>
>>>> Unfortunately things are not working.  The MS is apparently unable to connect to the hosts but I can not figure out why from the logs.
>>>>
>>>> Logs here: https://www.dropbox.com/s/s5glxrbyatmsoug/management-server.log
>>>>
>>>> Any help recovering is appreciated.  I do not want to have to re-install and create/import template for each of the instance VHDs.
>>>>
>>>> thank you,
>>>> -Carlos
>>>
>>
>



-- 
Daan

Re: Change IP subnet of cluster [SOLVED]

Posted by Daan Hoogland <da...@gmail.com>.

H Carlos, glad you figured it out. A colleague had a similar issue but
his finding was that the host table included a timestamp to identify
the management server.

You are right about the replacement of the management server whether
it is mac or timestamp, this will pose a problem.

On Fri, Jul 25, 2014 at 10:43 PM, Carlos Reátegui <cr...@gmail.com> wrote:
> My system is back up and running.
>
> As I suspected in my second email the problem was related to the msid in the mshost table.  Upon bringing up my system a new mshost entry was being created for the same MS and for some reason it was unable to connect to my XenServer hosts.
>
> I decided to go back to my edited sql with the new IPs and change the existing mshost entry to have the new msid value:
>
> sed -i.bak4 's/159090355471823/159090355471825/g' cloudstack_cloud-newips.sql
>
> I did a global replace since there are foreign key constraints on the msid that I saw in the host and async_job tables
>
> After reloading this new sql and starting the MS everything is back to normal, but with a new subnet for my hosts and guests (this is a basic network).
>
>
> Question for the Devs:
>
> Lets say the machine my MS was running on crashed and I replaced it with a new machine.  Since the msid is derived from the MAC wouldn’t I have encountered this same problem and not been able to have the new machine connect to the hosts?
>
> thanks,
> Carlos
>
>
>
>
> On Jul 25, 2014, at 8:46 AM, Carlos Reátegui <cr...@gmail.com> wrote:
>
>> Any thoughts out there?
>>
>> It keeps trying to connect to the hosts but it is unable to and there are no clues in the logs as to why.  I am successfully connected with XenCenter to the pool and also am able to ssh to all the hosts from the MS.
>>
>> What does “Disable Cluster” or “Unmanage Cluster” do?  Should I try that and re-enable/manage?
>>
>> From the UI, things appear ok but starting any instance fails.
>>
>> Thanks,
>> Carlos
>>
>> Log snippet form this am:
>>
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 2-65931749: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Routing from 159090355471825
>> 2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null) Seq 2-65931749: Link is closed
>> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2-65931749: MgmtId 159090355471825: Req: Resource [Host:2] is
>>  unreachable: Host 2: Link is closed
>> 2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null) Seq 2--1: MgmtId 159090355471825: Req: Routing to peer
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-14:null) Seq 2--1: MgmtId 159090355471825: Req: Cancel request received
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-14:null) Seq 2-65931749: Cancelling.
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,601 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in
>>  error code list for exceptions
>> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Timed out on null
>> 2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749: Cancelling.
>> 2014-07-25 21:03:19,601 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 65931749 to Host 2 timed out after 3600
>> 2014-07-25 21:03:19,601 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 2 statistics.
>> 2014-07-25 21:03:19,601 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 2
>> 2014-07-25 21:03:19,606 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 3-602278373: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Routing from 159090355471825
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-602278373: Link is closed
>> 2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3-602278373: MgmtId 159090355471825: Req: Resource [Host:3] is unreachable: Host 3: Link is closed
>> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3--1: MgmtId 159090355471825: Req: Routing to peer
>> 2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-4:null) Seq 3--1: MgmtId 159090355471825: Req: Cancel request received
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-4:null) Seq 3-602278373: Cancelling.
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Waiting some more time because this is the current command
>> 2014-07-25 21:03:19,609 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions
>> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Timed out on null
>> 2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373: Cancelling.
>> 2014-07-25 21:03:19,609 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation timed out: Commands 602278373 to Host 3 timed out after 3600
>> 2014-07-25 21:03:19,609 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null) Unable to obtain host 3 statistics.
>> 2014-07-25 21:03:19,609 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received invalid host stats for host: 3
>> 2014-07-25 21:03:19,614 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 5-1311574501: Forwarding null to 159090355471823
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Routing from 159090355471825
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null) Seq 5-1311574501: Link is closed
>> 2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-1:null) Seq 5-1311574501: MgmtId 159090355471825: Req: Resource [Host:5] is unreachable: Host 5: Link is closed
>>
>>
>>
>> On Jul 24, 2014, at 10:59 PM, Carlos Reátegui <ca...@reategui.com> wrote:
>>
>>> Not sure if it is related but I see 2 entries in the mshost for my same server but with different msid.  Both show as ‘Up’.  In reading the table comments it seems the msid is based on the MAC.  I am guessing this may be due to using a bond and that it may be have selected a different NIC to get the bond MAC from.  Is it ok to have both of these entries?  Should I mark the old one as Down?
>>>
>>> Along these lines is there something similar with the hosts and that is why the MS is having problems connecting to them, ie. the MACs don’t match?
>>>
>>> thanks,
>>> Carlos
>>>
>>>
>>> On Jul 24, 2014, at 3:35 PM, Carlos Reategui <ca...@reategui.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Had to move one of my clusters to a new subnet but it is not working (e.g. 192.168.1.0/24 to 10.100.1.0/24).  These are the steps I took:
>>>>
>>>> Environment: CS 4.1.1 on Ubuntu 12.04, XenServer 6.1, Shared NFS SR.
>>>>
>>>> 1) stopped all instances using cloudstack UI
>>>> 2) stop cloudstack-management service on MS
>>>> 3) Used XenCenter to kill the system VMs (no other instances running)
>>>> 4) Created backup of cloud db.
>>>> 5) Followed http://support.citrix.com/article/CTX123477 and successfully changed the IP of hosts.  According to XenCenter everything is good including SR.
>>>> 6) Changed IP of MS
>>>> 7) verified communication between MS and Hosts using ssh and ping with new IPs.
>>>> 8) used sed to search and replace all old IPs with new IPs in cloud backup sql file (e.g. sed -i.bak 's/192.168.1./10.100.1./g' clouddb.sql).
>>>> 9) visually verified all diffs in the sql file and made sure no references to 192.168 left.
>>>> 10) loaded up new sql
>>>> 11) search all files under /etc on MS for old IP. found and edited: /etc/cloudstack/management/db.properties
>>>> 12) start cloudstack-management service on MS
>>>>
>>>> Unfortunately things are not working.  The MS is apparently unable to connect to the hosts but I can not figure out why from the logs.
>>>>
>>>> Logs here: https://www.dropbox.com/s/s5glxrbyatmsoug/management-server.log
>>>>
>>>> Any help recovering is appreciated.  I do not want to have to re-install and create/import template for each of the instance VHDs.
>>>>
>>>> thank you,
>>>> -Carlos
>>>
>>
>



-- 
Daan