You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Stephan Seitz <s....@secretresearchfacility.com> on 2016/02/16 15:12:08 UTC

ACS management unable to connect to xenserver hosts after reboot

Hi acs gurus!

We're currently facing a really strange problem after two somewhat
simple steps.
1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
located)
2. Upgrade 4.7.0 to 4.7.1

Both steps seemed successful and running, but after a few days I've
noticed the SSVM in "running, not connected" state, so I decided to
restart the SSVM. That's where all the trouble begun...

I've pasted a somewhat repetive log excerpt here
http://pastebin.com/8MM6XUBk

If I try to (force) reconnect a host, we're getting huge repetive log
entries like pasted here http://pastebin.com/cNR3TtkG

Cloudmonkey quits with following Response:

(local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
Error Connection refused by server: ('Connection aborted.',
BadStatusLine("''",))


I've tcpdump'ed relevant traffic between management and xenservers and
found simply nothing except some (i assume) unrelated NFS-Packets.

Could please someone shed some light, how to fix that?

Thanks in advance!

- Stephan


Re: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid           | runid         | name             | state |
version | service_ip | service_port | last_update         | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-12 16:55:56 | NULL    |
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-17 11:31:50 | NULL    |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner: 
> Hi Stephan,
> 
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly 
> Looks like it’s a firewall issue on the ACS Management server
> 
> Thanks
> Glenn
> 
> 
> 
> 
> 
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> , 
> ShapeBlue
> d: 
>  | s: +27 21 527 0091
>  | 
> m: 
> +27 73 917 4111
> e: 
> glenn.wagner@shapeblue.com | t: 
>  | 
> w: 
> www.shapeblue.com
> a: 
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa
> 
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com] 
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
> 
> Hi again!
> 
> I think we've found the root source, but are unable to mitigate that:
> 
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
> 
> Here's a longer excerpt from the logfile during startup:
> 
> http://pastebin.com/SftVJCs4
> 
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis? 
> 
> 
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> > Hi acs gurus!
> > 
> > We're currently facing a really strange problem after two somewhat 
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> > 
> > Both steps seemed successful and running, but after a few days I've 
> > noticed the SSVM in "running, not connected" state, so I decided to 
> > restart the SSVM. That's where all the trouble begun...
> > 
> > I've pasted a somewhat repetive log excerpt here 
> > http://pastebin.com/8MM6XUBk
> > 
> > If I try to (force) reconnect a host, we're getting huge repetive
> log 
> > entries like pasted here http://pastebin.com/cNR3TtkG
> > 
> > Cloudmonkey quits with following Response:
> > 
> > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> > 
> > 
> > I've tcpdump'ed relevant traffic between management and xenservers
> and 
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> > 
> > Could please someone shed some light, how to fix that?
> > 
> > Thanks in advance!
> > 
> > - Stephan
> 
> 
> 
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering
> CloudStack Infrastructure Support | CloudStack Bootcamp Training
> Courses
> 



Re: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Paul,

thank you for your hint! That was the root cause of our problems:

https://bugs.launchpad.net/ubuntu/+source/ifenslave/+bug/1288196

We simply just didn't know that the msid is derived from the MAC.

Our services tend to be manageable again ;)

Thanks again guys!

cheers,

- Stephan


Am Mittwoch, den 17.02.2016, 19:16 +0000 schrieb Paul Angus: 
> The msid is generated from the MAC address of the host when the service starts, the two IDs are subtly different do you have some bonding in place that is maybe miss-configured, which is generating the 2nd MAC?
> 
> 
> 
> Paul Angus
> VP Technology   ,       ShapeBlue



> 
> 
> t:      @cloudyangus<te...@cloudyangus>
> 
> e:      paul.angus@shapeblue.com<ma...@shapeblue.com>        |      w:      www.shapeblue.com<http://www.shapeblue.com>
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Simon Weller [mailto:sweller@ena.com]
> Sent: Wednesday, February 17, 2016 6:11 PM
> To: dev@cloudstack.apache.org
> Cc: Glenn Wagner <gl...@shapeblue.com>
> Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot
> 
> Stephan,
> 
> When you restart the management process, do you see any logs indicating it's trying to peer with another management server?
> 
> - Si
> 
> ________________________________________
> From: Stephan Seitz <s....@secretresearchfacility.com>
> Sent: Wednesday, February 17, 2016 9:28 AM
> To: dev@cloudstack.apache.org
> Cc: Glenn Wagner
> Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot
> 
> Glenn,
> 
> thanks for your reply. Unfortunately the SSVM has been destroyed.
> 
> We don't have any firewall in between. ACS and XenServers are located in the same /22. I've double checked every connection and there's no iptables or similar in the way.
> Instead of the SSVM, I've just successfully checked if the consoleproxy VM is able to connect to Port 8250.
> 
> To me it looks, like there's some strange "identity" problem.
> 
> mysql> select * from mshost;
> +----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
> | id | msid | runid | name | state |
> version | service_ip | service_port | last_update | removed |
> alert_count |
> +----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
> | 1 | 57177340185274 | 1455209855143 | acs-management-1 | Up | 4.7.1
> | 10.97.13.1 | 9090 | 2016-02-12 16:55:56 | NULL |
> 0 |
> | 3 | 57177340185273 | 1455639355379 | acs-management-1 | Up | 4.7.1
> | 10.97.13.1 | 9090 | 2016-02-17 11:31:50 | NULL |
> 0 |
> +----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
> 2 rows in set (0.00 sec)
> 
> Indeed, there is (and always has been) only one management host in this infrastructure.
> 
> With sqldumps at hand, we removed the second row and purged all the related jobs to that id, but after restarting cloudstack-management, this entry wasi created again.
> 
> Maybe, I'm completely wrong, but is it possible that our management host "thinks" there's another management host responsible for our cluster?
> 
> Since we're fiddling at least two days without any success here, I'm willing to get a few consulting hours thrown on that.
> 
> cheers,
> 
> - Stephan
> 
> btw. sorry, if this is a double post, but I think the list ate my last mail...
> 
> 
> Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner:
> > Hi Stephan,
> >
> > Check that you can telnet port 8250 on the management server from SSVM
> > , check that iptables has been setup correctly Looks like it’s a
> > firewall issue on the ACS Management server
> >
> > Thanks
> > Glenn
> >
> >
> >
> >
> >
> > ShapeBlue
> > Glenn Wagner
> > Senior
> > Consultant
> > ,
> > ShapeBlue
> > d:
> > | s: +27 21 527 0091
> > |
> > m:
> > +27 73 917 4111
> > e:
> > glenn.wagner@shapeblue.com | t:
> > |
> > w:
> > www.shapeblue.com
> > a:
> > 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130
> > South Africa
> >
> > Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> > Services India LLP is a company incorporated in India and is operated
> > under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> > is a company incorporated in Brasil and is operated under license from
> > Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> > Republic of South Africa and is traded under license from Shape Blue
> > Ltd. ShapeBlue is a registered trademark.
> > This email and any attachments to it may be confidential and are
> > intended solely for the use of the individual to whom it is addressed.
> > Any views or opinions expressed are solely those of the author and do
> > not necessarily represent those of Shape Blue Ltd or related
> > companies. If you are not the intended recipient of this email, you
> > must neither take any action based upon its contents, nor copy or show
> > it to anyone. Please contact the sender if you believe you have
> > received this email in error.
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
> > Sent: Tuesday, 16 February 2016 5:19 PM
> > To: users@cloudstack.apache.org
> > Cc: dev@cloudstack.apache.org
> > Subject: [update] ACS management unable to connect to xenserver hosts
> > after reboot
> >
> > Hi again!
> >
> > I think we've found the root source, but are unable to mitigate that:
> >
> > 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> > Routing to peer
> > 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> > Cancel request received
> > 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> > 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> > closed
> > 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> > Routing to peer
> > 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> > Cancel request received
> > 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> > (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> > 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> > closed
> >
> > Here's a longer excerpt from the logfile during startup:
> >
> > http://pastebin.com/SftVJCs4
> >
> > Maybe someone knows how to resolve this? To me it looks like our
> > single management-host has some kind of identity crisis?
> >
> >
> > Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> > > Hi acs gurus!
> > >
> > > We're currently facing a really strange problem after two somewhat
> > > simple steps.
> > > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > > located)
> > > 2. Upgrade 4.7.0 to 4.7.1
> > >
> > > Both steps seemed successful and running, but after a few days I've
> > > noticed the SSVM in "running, not connected" state, so I decided to
> > > restart the SSVM. That's where all the trouble begun...
> > >
> > > I've pasted a somewhat repetive log excerpt here
> > > http://pastebin.com/8MM6XUBk
> > >
> > > If I try to (force) reconnect a host, we're getting huge repetive
> > log
> > > entries like pasted here http://pastebin.com/cNR3TtkG
> > >
> > > Cloudmonkey quits with following Response:
> > >
> > > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > > Error Connection refused by server: ('Connection aborted.',
> > > BadStatusLine("''",))
> > >
> > >
> > > I've tcpdump'ed relevant traffic between management and xenservers
> > and
> > > found simply nothing except some (i assume) unrelated NFS-Packets.
> > >
> > > Could please someone shed some light, how to fix that?
> > >
> > > Thanks in advance!
> > >
> > > - Stephan
> >
> >
> >
> > Find out more about ShapeBlue and our range of CloudStack related
> > services:
> > IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> > CloudStack Consulting | CloudStack Software Engineering CloudStack
> > Infrastructure Support | CloudStack Bootcamp Training Courses
> >
> 
> 
> Find out more about ShapeBlue and our range of CloudStack related services:
> IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> | CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> | CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
> CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> | CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>



RE: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Paul Angus <pa...@shapeblue.com>.
The msid is generated from the MAC address of the host when the service starts, the two IDs are subtly different do you have some bonding in place that is maybe miss-configured, which is generating the 2nd MAC?



Paul Angus
VP Technology   ,       ShapeBlue


t:      @cloudyangus<te...@cloudyangus>

e:      paul.angus@shapeblue.com<ma...@shapeblue.com>        |      w:      www.shapeblue.com<http://www.shapeblue.com>





-----Original Message-----
From: Simon Weller [mailto:sweller@ena.com]
Sent: Wednesday, February 17, 2016 6:11 PM
To: dev@cloudstack.apache.org
Cc: Glenn Wagner <gl...@shapeblue.com>
Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot

Stephan,

When you restart the management process, do you see any logs indicating it's trying to peer with another management server?

- Si

________________________________________
From: Stephan Seitz <s....@secretresearchfacility.com>
Sent: Wednesday, February 17, 2016 9:28 AM
To: dev@cloudstack.apache.org
Cc: Glenn Wagner
Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot

Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in the same /22. I've double checked every connection and there's no iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid | runid | name | state |
version | service_ip | service_port | last_update | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| 1 | 57177340185274 | 1455209855143 | acs-management-1 | Up | 4.7.1
| 10.97.13.1 | 9090 | 2016-02-12 16:55:56 | NULL |
0 |
| 3 | 57177340185273 | 1455639355379 | acs-management-1 | Up | 4.7.1
| 10.97.13.1 | 9090 | 2016-02-17 11:31:50 | NULL |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this infrastructure.

With sqldumps at hand, we removed the second row and purged all the related jobs to that id, but after restarting cloudstack-management, this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host "thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm willing to get a few consulting hours thrown on that.

cheers,

- Stephan

btw. sorry, if this is a double post, but I think the list ate my last mail...


Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner:
> Hi Stephan,
>
> Check that you can telnet port 8250 on the management server from SSVM
> , check that iptables has been setup correctly Looks like it’s a
> firewall issue on the ACS Management server
>
> Thanks
> Glenn
>
>
>
>
>
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> ,
> ShapeBlue
> d:
> | s: +27 21 527 0091
> |
> m:
> +27 73 917 4111
> e:
> glenn.wagner@shapeblue.com | t:
> |
> w:
> www.shapeblue.com
> a:
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130
> South Africa
>
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
>
>
>
>
>
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
>
> Hi again!
>
> I think we've found the root source, but are unable to mitigate that:
>
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
>
> Here's a longer excerpt from the logfile during startup:
>
> http://pastebin.com/SftVJCs4
>
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis?
>
>
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> > Hi acs gurus!
> >
> > We're currently facing a really strange problem after two somewhat
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> >
> > Both steps seemed successful and running, but after a few days I've
> > noticed the SSVM in "running, not connected" state, so I decided to
> > restart the SSVM. That's where all the trouble begun...
> >
> > I've pasted a somewhat repetive log excerpt here
> > http://pastebin.com/8MM6XUBk
> >
> > If I try to (force) reconnect a host, we're getting huge repetive
> log
> > entries like pasted here http://pastebin.com/cNR3TtkG
> >
> > Cloudmonkey quits with following Response:
> >
> > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> >
> >
> > I've tcpdump'ed relevant traffic between management and xenservers
> and
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> >
> > Could please someone shed some light, how to fix that?
> >
> > Thanks in advance!
> >
> > - Stephan
>
>
>
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering CloudStack
> Infrastructure Support | CloudStack Bootcamp Training Courses
>


Find out more about ShapeBlue and our range of CloudStack related services:
IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> | CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> | CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> | CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

Re: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Simon Weller <sw...@ena.com>.
Stephan,

When you restart the management process, do you see any logs indicating it's trying to peer with another management server?

- Si

________________________________________
From: Stephan Seitz <s....@secretresearchfacility.com>
Sent: Wednesday, February 17, 2016 9:28 AM
To: dev@cloudstack.apache.org
Cc: Glenn Wagner
Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot

Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid           | runid         | name             | state |
version | service_ip | service_port | last_update         | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-12 16:55:56 | NULL    |
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-17 11:31:50 | NULL    |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

btw. sorry, if this is a double post, but I think the list ate my last
mail...


Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner:
> Hi Stephan,
>
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly
> Looks like it’s a firewall issue on the ACS Management server
>
> Thanks
> Glenn
>
>
>
>
>
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> ,
> ShapeBlue
> d:
>  | s: +27 21 527 0091
>  |
> m:
> +27 73 917 4111
> e:
> glenn.wagner@shapeblue.com | t:
>  |
> w:
> www.shapeblue.com
> a:
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa
>
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
>
>
>
>
>
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
>
> Hi again!
>
> I think we've found the root source, but are unable to mitigate that:
>
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
>
> Here's a longer excerpt from the logfile during startup:
>
> http://pastebin.com/SftVJCs4
>
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis?
>
>
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> > Hi acs gurus!
> >
> > We're currently facing a really strange problem after two somewhat
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> >
> > Both steps seemed successful and running, but after a few days I've
> > noticed the SSVM in "running, not connected" state, so I decided to
> > restart the SSVM. That's where all the trouble begun...
> >
> > I've pasted a somewhat repetive log excerpt here
> > http://pastebin.com/8MM6XUBk
> >
> > If I try to (force) reconnect a host, we're getting huge repetive
> log
> > entries like pasted here http://pastebin.com/cNR3TtkG
> >
> > Cloudmonkey quits with following Response:
> >
> > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> >
> >
> > I've tcpdump'ed relevant traffic between management and xenservers
> and
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> >
> > Could please someone shed some light, how to fix that?
> >
> > Thanks in advance!
> >
> > - Stephan
>
>
>
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering
> CloudStack Infrastructure Support | CloudStack Bootcamp Training
> Courses
>



Re: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid           | runid         | name             | state |
version | service_ip | service_port | last_update         | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-12 16:55:56 | NULL    |
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-17 11:31:50 | NULL    |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

btw. sorry, if this is a double post, but I think the list ate my last
mail...


Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner: 
> Hi Stephan,
> 
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly 
> Looks like it’s a firewall issue on the ACS Management server
> 
> Thanks
> Glenn
> 
> 
> 
> 
> 
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> , 
> ShapeBlue
> d: 
>  | s: +27 21 527 0091
>  | 
> m: 
> +27 73 917 4111
> e: 
> glenn.wagner@shapeblue.com | t: 
>  | 
> w: 
> www.shapeblue.com
> a: 
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa
> 
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com] 
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
> 
> Hi again!
> 
> I think we've found the root source, but are unable to mitigate that:
> 
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
> 
> Here's a longer excerpt from the logfile during startup:
> 
> http://pastebin.com/SftVJCs4
> 
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis? 
> 
> 
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> > Hi acs gurus!
> > 
> > We're currently facing a really strange problem after two somewhat 
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> > 
> > Both steps seemed successful and running, but after a few days I've 
> > noticed the SSVM in "running, not connected" state, so I decided to 
> > restart the SSVM. That's where all the trouble begun...
> > 
> > I've pasted a somewhat repetive log excerpt here 
> > http://pastebin.com/8MM6XUBk
> > 
> > If I try to (force) reconnect a host, we're getting huge repetive
> log 
> > entries like pasted here http://pastebin.com/cNR3TtkG
> > 
> > Cloudmonkey quits with following Response:
> > 
> > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> > 
> > 
> > I've tcpdump'ed relevant traffic between management and xenservers
> and 
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> > 
> > Could please someone shed some light, how to fix that?
> > 
> > Thanks in advance!
> > 
> > - Stephan
> 
> 
> 
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering
> CloudStack Infrastructure Support | CloudStack Bootcamp Training
> Courses
> 



Re: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid           | runid         | name             | state |
version | service_ip | service_port | last_update         | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-12 16:55:56 | NULL    |
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-17 11:31:50 | NULL    |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner: 
> Hi Stephan,
> 
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly 
> Looks like it’s a firewall issue on the ACS Management server
> 
> Thanks
> Glenn
> 
> 
> 
> 
> 
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> , 
> ShapeBlue
> d: 
>  | s: +27 21 527 0091
>  | 
> m: 
> +27 73 917 4111
> e: 
> glenn.wagner@shapeblue.com | t: 
>  | 
> w: 
> www.shapeblue.com
> a: 
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa
> 
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com] 
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
> 
> Hi again!
> 
> I think we've found the root source, but are unable to mitigate that:
> 
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
> 
> Here's a longer excerpt from the logfile during startup:
> 
> http://pastebin.com/SftVJCs4
> 
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis? 
> 
> 
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> > Hi acs gurus!
> > 
> > We're currently facing a really strange problem after two somewhat 
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> > 
> > Both steps seemed successful and running, but after a few days I've 
> > noticed the SSVM in "running, not connected" state, so I decided to 
> > restart the SSVM. That's where all the trouble begun...
> > 
> > I've pasted a somewhat repetive log excerpt here 
> > http://pastebin.com/8MM6XUBk
> > 
> > If I try to (force) reconnect a host, we're getting huge repetive
> log 
> > entries like pasted here http://pastebin.com/cNR3TtkG
> > 
> > Cloudmonkey quits with following Response:
> > 
> > (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> > 
> > 
> > I've tcpdump'ed relevant traffic between management and xenservers
> and 
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> > 
> > Could please someone shed some light, how to fix that?
> > 
> > Thanks in advance!
> > 
> > - Stephan
> 
> 
> 
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering
> CloudStack Infrastructure Support | CloudStack Bootcamp Training
> Courses
> 



RE: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Glenn Wagner <gl...@shapeblue.com>.
Hi Stephan,

Check that you can telnet port 8250 on the management server from SSVM , check that iptables has been setup correctly
Looks like it’s a firewall issue on the ACS Management server

Thanks
Glenn





[ShapeBlue]<http://www.shapeblue.com>
Glenn Wagner
Senior Consultant       ,       ShapeBlue


d:       | s: +27 21 527 0091<tel:|%20s:%20+27%2021%20527%200091>        |      m:      +27 73 917 4111<tel:+27%2073%20917%204111>

e:      glenn.wagner@shapeblue.com | t: <mailto:glenn.wagner@shapeblue.com%20|%20t:>     |      w:      www.shapeblue.com<http://www.shapeblue.com>

a:      2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa


[cid:image6aa740.png@33ce927b.48914897]


Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.
This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error.




-----Original Message-----
From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
Sent: Tuesday, 16 February 2016 5:19 PM
To: users@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Subject: [update] ACS management unable to connect to xenserver hosts after reboot

Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single management-host has some kind of identity crisis?


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> Hi acs gurus!
>
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
>
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
>
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
>
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
>
> Cloudmonkey quits with following Response:
>
> (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
>
>
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
>
> Could please someone shed some light, how to fix that?
>
> Thanks in advance!
>
> - Stephan


Find out more about ShapeBlue and our range of CloudStack related services:
IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> | CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> | CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> | CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

RE: [update] ACS management unable to connect to xenserver hosts after reboot

Posted by Glenn Wagner <gl...@shapeblue.com>.
Hi Stephan,

Check that you can telnet port 8250 on the management server from SSVM , check that iptables has been setup correctly
Looks like it’s a firewall issue on the ACS Management server

Thanks
Glenn





[ShapeBlue]<http://www.shapeblue.com>
Glenn Wagner
Senior Consultant       ,       ShapeBlue


d:       | s: +27 21 527 0091<tel:|%20s:%20+27%2021%20527%200091>        |      m:      +27 73 917 4111<tel:+27%2073%20917%204111>

e:      glenn.wagner@shapeblue.com | t: <mailto:glenn.wagner@shapeblue.com%20|%20t:>     |      w:      www.shapeblue.com<http://www.shapeblue.com>

a:      2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa


[cid:image6aa740.png@33ce927b.48914897]


Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.
This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error.




-----Original Message-----
From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
Sent: Tuesday, 16 February 2016 5:19 PM
To: users@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Subject: [update] ACS management unable to connect to xenserver hosts after reboot

Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single management-host has some kind of identity crisis?


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> Hi acs gurus!
>
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
>
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
>
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
>
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
>
> Cloudmonkey quits with following Response:
>
> (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
>
>
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
>
> Could please someone shed some light, how to fix that?
>
> Thanks in advance!
>
> - Stephan


Find out more about ShapeBlue and our range of CloudStack related services:
IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> | CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> | CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> | CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

[update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single
management-host has some kind of identity crisis? 


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> Hi acs gurus!
> 
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
> 
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
> 
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
> 
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
> 
> Cloudmonkey quits with following Response:
> 
> (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
> 
> 
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
> 
> Could please someone shed some light, how to fix that?
> 
> Thanks in advance!
> 
> - Stephan



[update] ACS management unable to connect to xenserver hosts after reboot

Posted by Stephan Seitz <s....@secretresearchfacility.com>.
Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single
management-host has some kind of identity crisis? 


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> Hi acs gurus!
> 
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
> 
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
> 
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
> 
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
> 
> Cloudmonkey quits with following Response:
> 
> (local) 🐡 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
> 
> 
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
> 
> Could please someone shed some light, how to fix that?
> 
> Thanks in advance!
> 
> - Stephan