You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Marty Sweet <ms...@gmail.com> on 2013/10/22 22:39:44 UTC

CS-Management HA Networking

Hi Guys.

I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the weekend.

When testing my 4.1.1 setup I ran across a problem where a TOR switch
failure would cause an outage to the management server. The agents use 2
NICs for all management traffic using bonds.
When I tried to configure the management server to use a bond0 in simple
active-passive mode (like I use for my agent management network),
cloudstack-management would not start due to 'Integrity Issues', which at
the time I located back to a IntegitryChecker which ensures the interfaces
of eth* em* or some others were taking the IP of management server.

My question is does this limitation still exist and if so, can it be
overcome by adding bond* to the list of allowed interface names and
compiling the management server from source?
I would love to hear input to this, it seems bizarre to me that it is
difficult to add simple but effective network redundancy to the management
server.

For scenario basis, this is the basic redundant network setup I have for my
Agents:
4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)

Example Host:
------------------Interconnect---------------
      TOR 1      ---------      TOR 2
---------------------          ---------------------
          |      Management      |
          |     Tagged VLANs    |
----------------------------------------------------
       KVM Cloudstack Hypervisor
----------------------------------------------------
          |      Public Traffic         |
          |      Tagged VLANS     |
          |      LACP Aggregation |
----------------------------------------------------
                Core Router
----------------------------------------------------

There are also LACP links with STP rules between the TOR switches are the
core device to allow for interconnect failure so the TORs do not become
isolated, but I have excluded that for simplicity.


I would have thought it would be easy to create a bond for my management
node and connect the two NICs to both the TOR switches, but that didn't
work in 4.1.1 due to my reasons above.

Thanks!
Marty

Re: CS-Management HA Networking

Posted by Darren Shepherd <da...@gmail.com>.
I was thinking the same thing.  Right now there's really not much logic around it, just grab the first one.  

One problem that may arise has to do with initialization order.  It may be that the time that the MAC address is read  database access may not be available.  Regardless it should be deterministic regardless of OS interface order.  I'll look at the code probably sometime next week. 

Darren

> On Oct 26, 2013, at 11:09 AM, Marty Sweet <ms...@gmail.com> wrote:
> 
> Possibly, I would say it makes more sense to find what interface the
> cluster.node.ip (I think) is using, then get the MAC address from that
> interface?
> Then if users add interfaces and change the order through udev it will be
> persistent (to an extent).
> 
> The only problem I can think of is if it's using loopback, which it
> shouldn't do if management setup is run?
> 
> Marty
> 
> 
> On Sat, Oct 26, 2013 at 6:08 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com> wrote:
> 
>> Glad that helped.  Seems that we should change CloudStack to ignore mac
>> addresses that are 00:..:00.  If you want to put in a bug you can assign it
>> me and I'll look into changing that.
>> 
>> Darren
>> 
>>> On Oct 26, 2013, at 5:35 AM, Marty Sweet <ms...@gmail.com> wrote:
>>> 
>>> Hi Darren, thanks for the heads up about that script.
>>> 
>>> Old Networking Setup:
>>> eth0 eth1 -> management0
>>> management0.11 -> vlan11
>>> management0.12 -> vlan12
>>> 
>>> Turns out in true Ubuntu Networking fashion bond0 was being created for
>> no
>>> reason and was appearing in ifconfig -a (so the script was pulling out
>> the
>>> first mac address it found), although it was not active and could not be
>>> downed.
>>> 
>>> bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00
>>>         BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
>>>         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:0
>>>         RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>>> 
>>> Under this configuration the script returned:
>>> addr in integer is 0
>>> addr in bytes is  0 0 0 0 0 0
>>> addr in char is 00:00:00:00:00:00
>>> 
>>> 
>>> Once I used bond0 as my bond name, opposed to management0, it started
>>> working, as the bond was now in use.
>>> Old Networking Setup:
>>> eth0 eth1 -> bond0
>>> bond0.11 -> vlan11
>>> bond0.12 -> vlan12
>>> 
>>> Many thanks,
>>> Marty
>>> 
>>> 
>>>> On Wed, Oct 23, 2013 at 7:44 AM, Marty Sweet <ms...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Darren,
>>>> 
>>>> Thanks for getting back to me. I will set the networking config up again
>>>> and run the commands you sent me over the next couple of days.
>>>> 
>>>> Thanks,
>>>> Marty
>>>> 
>>>> 
>>>> On Tue, Oct 22, 2013 at 11:39 PM, Darren Shepherd <
>>>> darren.s.shepherd@gmail.com> wrote:
>>>> 
>>>>> Well that wasn't very useful message.  If you can find the cloud-utils
>>>>> jar on your server run
>>>>> 
>>>>> java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress
>>>>> 
>>>>> That will output what its finding for the mac address.  Also run an
>>>>> "ifconfig -a" from the command line.  If you won't mind sending the
>>>>> output of "ifconfig -a" that would be helpful to see what's going
>>>>> wrong.
>>>>> 
>>>>> Darren
>>>>> 
>>>>> On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com>
>>>>> wrote:
>>>>>> Just noticed I didn't include the log:
>>>>>> 
>>>>>> http://pastebin.com/wUtCsSAb
>>>>>> 
>>>>>> Marty
>>>>>> 
>>>>>> 
>>>>>>> On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Darren,
>>>>>>> 
>>>>>>> Maybe I'm getting confused with an issue I had with the Agents around
>>>>> that
>>>>>>> time!
>>>>>>> The error message I got was very cryptic. Having a fresh look at the
>>>>>>> source code:
>> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
>>>>>>> 
>>>>>>> Would suggest that it gets: private static final long s_nodeId =
>>>>>>> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the
>>>>> check()
>>>>>>> function, which is run by the SystemIntegrityChecker.
>>>>>>> 
>>>>>>> Hopefully it is just a MAC Address issue, what would the
>>>>> IntegrityChecker
>>>>>>> be looking for?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Marty
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
>>>>>>> darren.s.shepherd@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Do you have a specific error from a log?  I was not aware that
>>>>>>>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
>>>>>>>> just does "ifconfig -a" to list the devices.  By creating a bond,
>> the
>>>>>>>> mac address CloudStack finds will probably change then I could
>> imagine
>>>>>>>> something could possibly fail.
>>>>>>>> 
>>>>>>>> Darren
>>>>>>>> 
>>>>>>>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Hi Guys.
>>>>>>>>> 
>>>>>>>>> I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
>>>>>>>> weekend.
>>>>>>>>> 
>>>>>>>>> When testing my 4.1.1 setup I ran across a problem where a TOR
>>>>> switch
>>>>>>>>> failure would cause an outage to the management server. The agents
>>>>> use 2
>>>>>>>>> NICs for all management traffic using bonds.
>>>>>>>>> When I tried to configure the management server to use a bond0 in
>>>>> simple
>>>>>>>>> active-passive mode (like I use for my agent management network),
>>>>>>>>> cloudstack-management would not start due to 'Integrity Issues',
>>>>> which
>>>>>>>> at
>>>>>>>>> the time I located back to a IntegitryChecker which ensures the
>>>>>>>> interfaces
>>>>>>>>> of eth* em* or some others were taking the IP of management server.
>>>>>>>>> 
>>>>>>>>> My question is does this limitation still exist and if so, can it
>> be
>>>>>>>>> overcome by adding bond* to the list of allowed interface names and
>>>>>>>>> compiling the management server from source?
>>>>>>>>> I would love to hear input to this, it seems bizarre to me that it
>>>>> is
>>>>>>>>> difficult to add simple but effective network redundancy to the
>>>>>>>> management
>>>>>>>>> server.
>>>>>>>>> 
>>>>>>>>> For scenario basis, this is the basic redundant network setup I
>> have
>>>>>>>> for my
>>>>>>>>> Agents:
>>>>>>>>> 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>>>>>>>>> 
>>>>>>>>> Example Host:
>>>>>>>>> ------------------Interconnect---------------
>>>>>>>>>     TOR 1      ---------      TOR 2
>>>>>>>>> ---------------------          ---------------------
>>>>>>>>>         |      Management      |
>>>>>>>>>         |     Tagged VLANs    |
>>>>>>>>> ----------------------------------------------------
>>>>>>>>>      KVM Cloudstack Hypervisor
>>>>>>>>> ----------------------------------------------------
>>>>>>>>>         |      Public Traffic         |
>>>>>>>>>         |      Tagged VLANS     |
>>>>>>>>>         |      LACP Aggregation |
>>>>>>>>> ----------------------------------------------------
>>>>>>>>>               Core Router
>>>>>>>>> ----------------------------------------------------
>>>>>>>>> 
>>>>>>>>> There are also LACP links with STP rules between the TOR switches
>>>>> are
>>>>>>>> the
>>>>>>>>> core device to allow for interconnect failure so the TORs do not
>>>>> become
>>>>>>>>> isolated, but I have excluded that for simplicity.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I would have thought it would be easy to create a bond for my
>>>>> management
>>>>>>>>> node and connect the two NICs to both the TOR switches, but that
>>>>> didn't
>>>>>>>>> work in 4.1.1 due to my reasons above.
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>>> Marty
>> 

Re: CS-Management HA Networking

Posted by Marty Sweet <ms...@gmail.com>.
Possibly, I would say it makes more sense to find what interface the
cluster.node.ip (I think) is using, then get the MAC address from that
interface?
Then if users add interfaces and change the order through udev it will be
persistent (to an extent).

The only problem I can think of is if it's using loopback, which it
shouldn't do if management setup is run?

Marty


On Sat, Oct 26, 2013 at 6:08 PM, Darren Shepherd <
darren.s.shepherd@gmail.com> wrote:

> Glad that helped.  Seems that we should change CloudStack to ignore mac
> addresses that are 00:..:00.  If you want to put in a bug you can assign it
> me and I'll look into changing that.
>
> Darren
>
> > On Oct 26, 2013, at 5:35 AM, Marty Sweet <ms...@gmail.com> wrote:
> >
> > Hi Darren, thanks for the heads up about that script.
> >
> > Old Networking Setup:
> > eth0 eth1 -> management0
> > management0.11 -> vlan11
> > management0.12 -> vlan12
> >
> > Turns out in true Ubuntu Networking fashion bond0 was being created for
> no
> > reason and was appearing in ifconfig -a (so the script was pulling out
> the
> > first mac address it found), although it was not active and could not be
> > downed.
> >
> > bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00
> >          BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
> >          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >          collisions:0 txqueuelen:0
> >          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> >
> > Under this configuration the script returned:
> > addr in integer is 0
> > addr in bytes is  0 0 0 0 0 0
> > addr in char is 00:00:00:00:00:00
> >
> >
> > Once I used bond0 as my bond name, opposed to management0, it started
> > working, as the bond was now in use.
> > Old Networking Setup:
> > eth0 eth1 -> bond0
> > bond0.11 -> vlan11
> > bond0.12 -> vlan12
> >
> > Many thanks,
> > Marty
> >
> >
> >> On Wed, Oct 23, 2013 at 7:44 AM, Marty Sweet <ms...@gmail.com>
> wrote:
> >>
> >> Hi Darren,
> >>
> >> Thanks for getting back to me. I will set the networking config up again
> >> and run the commands you sent me over the next couple of days.
> >>
> >> Thanks,
> >> Marty
> >>
> >>
> >> On Tue, Oct 22, 2013 at 11:39 PM, Darren Shepherd <
> >> darren.s.shepherd@gmail.com> wrote:
> >>
> >>> Well that wasn't very useful message.  If you can find the cloud-utils
> >>> jar on your server run
> >>>
> >>> java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress
> >>>
> >>> That will output what its finding for the mac address.  Also run an
> >>> "ifconfig -a" from the command line.  If you won't mind sending the
> >>> output of "ifconfig -a" that would be helpful to see what's going
> >>> wrong.
> >>>
> >>> Darren
> >>>
> >>> On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com>
> >>> wrote:
> >>>> Just noticed I didn't include the log:
> >>>>
> >>>> http://pastebin.com/wUtCsSAb
> >>>>
> >>>> Marty
> >>>>
> >>>>
> >>>>> On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Darren,
> >>>>>
> >>>>> Maybe I'm getting confused with an issue I had with the Agents around
> >>> that
> >>>>> time!
> >>>>> The error message I got was very cryptic. Having a fresh look at the
> >>>>> source code:
> >>>
> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
> >>>>>
> >>>>> Would suggest that it gets: private static final long s_nodeId =
> >>>>> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the
> >>> check()
> >>>>> function, which is run by the SystemIntegrityChecker.
> >>>>>
> >>>>> Hopefully it is just a MAC Address issue, what would the
> >>> IntegrityChecker
> >>>>> be looking for?
> >>>>>
> >>>>> Thanks,
> >>>>> Marty
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
> >>>>> darren.s.shepherd@gmail.com> wrote:
> >>>>>
> >>>>>> Do you have a specific error from a log?  I was not aware that
> >>>>>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
> >>>>>> just does "ifconfig -a" to list the devices.  By creating a bond,
> the
> >>>>>> mac address CloudStack finds will probably change then I could
> imagine
> >>>>>> something could possibly fail.
> >>>>>>
> >>>>>> Darren
> >>>>>>
> >>>>>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
> >>>>>> wrote:
> >>>>>>> Hi Guys.
> >>>>>>>
> >>>>>>> I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
> >>>>>> weekend.
> >>>>>>>
> >>>>>>> When testing my 4.1.1 setup I ran across a problem where a TOR
> >>> switch
> >>>>>>> failure would cause an outage to the management server. The agents
> >>> use 2
> >>>>>>> NICs for all management traffic using bonds.
> >>>>>>> When I tried to configure the management server to use a bond0 in
> >>> simple
> >>>>>>> active-passive mode (like I use for my agent management network),
> >>>>>>> cloudstack-management would not start due to 'Integrity Issues',
> >>> which
> >>>>>> at
> >>>>>>> the time I located back to a IntegitryChecker which ensures the
> >>>>>> interfaces
> >>>>>>> of eth* em* or some others were taking the IP of management server.
> >>>>>>>
> >>>>>>> My question is does this limitation still exist and if so, can it
> be
> >>>>>>> overcome by adding bond* to the list of allowed interface names and
> >>>>>>> compiling the management server from source?
> >>>>>>> I would love to hear input to this, it seems bizarre to me that it
> >>> is
> >>>>>>> difficult to add simple but effective network redundancy to the
> >>>>>> management
> >>>>>>> server.
> >>>>>>>
> >>>>>>> For scenario basis, this is the basic redundant network setup I
> have
> >>>>>> for my
> >>>>>>> Agents:
> >>>>>>> 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
> >>>>>>>
> >>>>>>> Example Host:
> >>>>>>> ------------------Interconnect---------------
> >>>>>>>      TOR 1      ---------      TOR 2
> >>>>>>> ---------------------          ---------------------
> >>>>>>>          |      Management      |
> >>>>>>>          |     Tagged VLANs    |
> >>>>>>> ----------------------------------------------------
> >>>>>>>       KVM Cloudstack Hypervisor
> >>>>>>> ----------------------------------------------------
> >>>>>>>          |      Public Traffic         |
> >>>>>>>          |      Tagged VLANS     |
> >>>>>>>          |      LACP Aggregation |
> >>>>>>> ----------------------------------------------------
> >>>>>>>                Core Router
> >>>>>>> ----------------------------------------------------
> >>>>>>>
> >>>>>>> There are also LACP links with STP rules between the TOR switches
> >>> are
> >>>>>> the
> >>>>>>> core device to allow for interconnect failure so the TORs do not
> >>> become
> >>>>>>> isolated, but I have excluded that for simplicity.
> >>>>>>>
> >>>>>>>
> >>>>>>> I would have thought it would be easy to create a bond for my
> >>> management
> >>>>>>> node and connect the two NICs to both the TOR switches, but that
> >>> didn't
> >>>>>>> work in 4.1.1 due to my reasons above.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Marty
> >>
> >>
>

Re: CS-Management HA Networking

Posted by Darren Shepherd <da...@gmail.com>.
Glad that helped.  Seems that we should change CloudStack to ignore mac addresses that are 00:..:00.  If you want to put in a bug you can assign it me and I'll look into changing that.

Darren

> On Oct 26, 2013, at 5:35 AM, Marty Sweet <ms...@gmail.com> wrote:
> 
> Hi Darren, thanks for the heads up about that script.
> 
> Old Networking Setup:
> eth0 eth1 -> management0
> management0.11 -> vlan11
> management0.12 -> vlan12
> 
> Turns out in true Ubuntu Networking fashion bond0 was being created for no
> reason and was appearing in ifconfig -a (so the script was pulling out the
> first mac address it found), although it was not active and could not be
> downed.
> 
> bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00
>          BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> 
> Under this configuration the script returned:
> addr in integer is 0
> addr in bytes is  0 0 0 0 0 0
> addr in char is 00:00:00:00:00:00
> 
> 
> Once I used bond0 as my bond name, opposed to management0, it started
> working, as the bond was now in use.
> Old Networking Setup:
> eth0 eth1 -> bond0
> bond0.11 -> vlan11
> bond0.12 -> vlan12
> 
> Many thanks,
> Marty
> 
> 
>> On Wed, Oct 23, 2013 at 7:44 AM, Marty Sweet <ms...@gmail.com> wrote:
>> 
>> Hi Darren,
>> 
>> Thanks for getting back to me. I will set the networking config up again
>> and run the commands you sent me over the next couple of days.
>> 
>> Thanks,
>> Marty
>> 
>> 
>> On Tue, Oct 22, 2013 at 11:39 PM, Darren Shepherd <
>> darren.s.shepherd@gmail.com> wrote:
>> 
>>> Well that wasn't very useful message.  If you can find the cloud-utils
>>> jar on your server run
>>> 
>>> java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress
>>> 
>>> That will output what its finding for the mac address.  Also run an
>>> "ifconfig -a" from the command line.  If you won't mind sending the
>>> output of "ifconfig -a" that would be helpful to see what's going
>>> wrong.
>>> 
>>> Darren
>>> 
>>> On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com>
>>> wrote:
>>>> Just noticed I didn't include the log:
>>>> 
>>>> http://pastebin.com/wUtCsSAb
>>>> 
>>>> Marty
>>>> 
>>>> 
>>>>> On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Darren,
>>>>> 
>>>>> Maybe I'm getting confused with an issue I had with the Agents around
>>> that
>>>>> time!
>>>>> The error message I got was very cryptic. Having a fresh look at the
>>>>> source code:
>>> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
>>>>> 
>>>>> Would suggest that it gets: private static final long s_nodeId =
>>>>> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the
>>> check()
>>>>> function, which is run by the SystemIntegrityChecker.
>>>>> 
>>>>> Hopefully it is just a MAC Address issue, what would the
>>> IntegrityChecker
>>>>> be looking for?
>>>>> 
>>>>> Thanks,
>>>>> Marty
>>>>> 
>>>>> 
>>>>> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
>>>>> darren.s.shepherd@gmail.com> wrote:
>>>>> 
>>>>>> Do you have a specific error from a log?  I was not aware that
>>>>>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
>>>>>> just does "ifconfig -a" to list the devices.  By creating a bond, the
>>>>>> mac address CloudStack finds will probably change then I could imagine
>>>>>> something could possibly fail.
>>>>>> 
>>>>>> Darren
>>>>>> 
>>>>>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
>>>>>> wrote:
>>>>>>> Hi Guys.
>>>>>>> 
>>>>>>> I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
>>>>>> weekend.
>>>>>>> 
>>>>>>> When testing my 4.1.1 setup I ran across a problem where a TOR
>>> switch
>>>>>>> failure would cause an outage to the management server. The agents
>>> use 2
>>>>>>> NICs for all management traffic using bonds.
>>>>>>> When I tried to configure the management server to use a bond0 in
>>> simple
>>>>>>> active-passive mode (like I use for my agent management network),
>>>>>>> cloudstack-management would not start due to 'Integrity Issues',
>>> which
>>>>>> at
>>>>>>> the time I located back to a IntegitryChecker which ensures the
>>>>>> interfaces
>>>>>>> of eth* em* or some others were taking the IP of management server.
>>>>>>> 
>>>>>>> My question is does this limitation still exist and if so, can it be
>>>>>>> overcome by adding bond* to the list of allowed interface names and
>>>>>>> compiling the management server from source?
>>>>>>> I would love to hear input to this, it seems bizarre to me that it
>>> is
>>>>>>> difficult to add simple but effective network redundancy to the
>>>>>> management
>>>>>>> server.
>>>>>>> 
>>>>>>> For scenario basis, this is the basic redundant network setup I have
>>>>>> for my
>>>>>>> Agents:
>>>>>>> 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>>>>>>> 
>>>>>>> Example Host:
>>>>>>> ------------------Interconnect---------------
>>>>>>>      TOR 1      ---------      TOR 2
>>>>>>> ---------------------          ---------------------
>>>>>>>          |      Management      |
>>>>>>>          |     Tagged VLANs    |
>>>>>>> ----------------------------------------------------
>>>>>>>       KVM Cloudstack Hypervisor
>>>>>>> ----------------------------------------------------
>>>>>>>          |      Public Traffic         |
>>>>>>>          |      Tagged VLANS     |
>>>>>>>          |      LACP Aggregation |
>>>>>>> ----------------------------------------------------
>>>>>>>                Core Router
>>>>>>> ----------------------------------------------------
>>>>>>> 
>>>>>>> There are also LACP links with STP rules between the TOR switches
>>> are
>>>>>> the
>>>>>>> core device to allow for interconnect failure so the TORs do not
>>> become
>>>>>>> isolated, but I have excluded that for simplicity.
>>>>>>> 
>>>>>>> 
>>>>>>> I would have thought it would be easy to create a bond for my
>>> management
>>>>>>> node and connect the two NICs to both the TOR switches, but that
>>> didn't
>>>>>>> work in 4.1.1 due to my reasons above.
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> Marty
>> 
>> 

Re: CS-Management HA Networking

Posted by Marty Sweet <ms...@gmail.com>.
Hi Darren, thanks for the heads up about that script.

Old Networking Setup:
eth0 eth1 -> management0
management0.11 -> vlan11
management0.12 -> vlan12

Turns out in true Ubuntu Networking fashion bond0 was being created for no
reason and was appearing in ifconfig -a (so the script was pulling out the
first mac address it found), although it was not active and could not be
downed.

bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Under this configuration the script returned:
addr in integer is 0
addr in bytes is  0 0 0 0 0 0
addr in char is 00:00:00:00:00:00


Once I used bond0 as my bond name, opposed to management0, it started
working, as the bond was now in use.
Old Networking Setup:
eth0 eth1 -> bond0
bond0.11 -> vlan11
bond0.12 -> vlan12

Many thanks,
Marty


On Wed, Oct 23, 2013 at 7:44 AM, Marty Sweet <ms...@gmail.com> wrote:

> Hi Darren,
>
> Thanks for getting back to me. I will set the networking config up again
> and run the commands you sent me over the next couple of days.
>
> Thanks,
> Marty
>
>
> On Tue, Oct 22, 2013 at 11:39 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com> wrote:
>
>> Well that wasn't very useful message.  If you can find the cloud-utils
>> jar on your server run
>>
>> java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress
>>
>> That will output what its finding for the mac address.  Also run an
>> "ifconfig -a" from the command line.  If you won't mind sending the
>> output of "ifconfig -a" that would be helpful to see what's going
>> wrong.
>>
>> Darren
>>
>> On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com>
>> wrote:
>> > Just noticed I didn't include the log:
>> >
>> > http://pastebin.com/wUtCsSAb
>> >
>> > Marty
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com>
>> wrote:
>> >
>> >> Hi Darren,
>> >>
>> >> Maybe I'm getting confused with an issue I had with the Agents around
>> that
>> >> time!
>> >> The error message I got was very cryptic. Having a fresh look at the
>> >> source code:
>> >>
>> >>
>> >>
>> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
>> >>
>> >> Would suggest that it gets: private static final long s_nodeId =
>> >> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the
>> check()
>> >> function, which is run by the SystemIntegrityChecker.
>> >>
>> >> Hopefully it is just a MAC Address issue, what would the
>> IntegrityChecker
>> >> be looking for?
>> >>
>> >> Thanks,
>> >> Marty
>> >>
>> >>
>> >> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
>> >> darren.s.shepherd@gmail.com> wrote:
>> >>
>> >>> Do you have a specific error from a log?  I was not aware that
>> >>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
>> >>> just does "ifconfig -a" to list the devices.  By creating a bond, the
>> >>> mac address CloudStack finds will probably change then I could imagine
>> >>> something could possibly fail.
>> >>>
>> >>> Darren
>> >>>
>> >>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
>> >>> wrote:
>> >>> > Hi Guys.
>> >>> >
>> >>> > I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
>> >>> weekend.
>> >>> >
>> >>> > When testing my 4.1.1 setup I ran across a problem where a TOR
>> switch
>> >>> > failure would cause an outage to the management server. The agents
>> use 2
>> >>> > NICs for all management traffic using bonds.
>> >>> > When I tried to configure the management server to use a bond0 in
>> simple
>> >>> > active-passive mode (like I use for my agent management network),
>> >>> > cloudstack-management would not start due to 'Integrity Issues',
>> which
>> >>> at
>> >>> > the time I located back to a IntegitryChecker which ensures the
>> >>> interfaces
>> >>> > of eth* em* or some others were taking the IP of management server.
>> >>> >
>> >>> > My question is does this limitation still exist and if so, can it be
>> >>> > overcome by adding bond* to the list of allowed interface names and
>> >>> > compiling the management server from source?
>> >>> > I would love to hear input to this, it seems bizarre to me that it
>> is
>> >>> > difficult to add simple but effective network redundancy to the
>> >>> management
>> >>> > server.
>> >>> >
>> >>> > For scenario basis, this is the basic redundant network setup I have
>> >>> for my
>> >>> > Agents:
>> >>> > 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>> >>> >
>> >>> > Example Host:
>> >>> > ------------------Interconnect---------------
>> >>> >       TOR 1      ---------      TOR 2
>> >>> > ---------------------          ---------------------
>> >>> >           |      Management      |
>> >>> >           |     Tagged VLANs    |
>> >>> > ----------------------------------------------------
>> >>> >        KVM Cloudstack Hypervisor
>> >>> > ----------------------------------------------------
>> >>> >           |      Public Traffic         |
>> >>> >           |      Tagged VLANS     |
>> >>> >           |      LACP Aggregation |
>> >>> > ----------------------------------------------------
>> >>> >                 Core Router
>> >>> > ----------------------------------------------------
>> >>> >
>> >>> > There are also LACP links with STP rules between the TOR switches
>> are
>> >>> the
>> >>> > core device to allow for interconnect failure so the TORs do not
>> become
>> >>> > isolated, but I have excluded that for simplicity.
>> >>> >
>> >>> >
>> >>> > I would have thought it would be easy to create a bond for my
>> management
>> >>> > node and connect the two NICs to both the TOR switches, but that
>> didn't
>> >>> > work in 4.1.1 due to my reasons above.
>> >>> >
>> >>> > Thanks!
>> >>> > Marty
>> >>>
>> >>
>> >>
>>
>
>

Re: CS-Management HA Networking

Posted by Marty Sweet <ms...@gmail.com>.
Hi Darren,

Thanks for getting back to me. I will set the networking config up again
and run the commands you sent me over the next couple of days.

Thanks,
Marty


On Tue, Oct 22, 2013 at 11:39 PM, Darren Shepherd <
darren.s.shepherd@gmail.com> wrote:

> Well that wasn't very useful message.  If you can find the cloud-utils
> jar on your server run
>
> java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress
>
> That will output what its finding for the mac address.  Also run an
> "ifconfig -a" from the command line.  If you won't mind sending the
> output of "ifconfig -a" that would be helpful to see what's going
> wrong.
>
> Darren
>
> On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com> wrote:
> > Just noticed I didn't include the log:
> >
> > http://pastebin.com/wUtCsSAb
> >
> > Marty
> >
> >
> > On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com>
> wrote:
> >
> >> Hi Darren,
> >>
> >> Maybe I'm getting confused with an issue I had with the Agents around
> that
> >> time!
> >> The error message I got was very cryptic. Having a fresh look at the
> >> source code:
> >>
> >>
> >>
> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
> >>
> >> Would suggest that it gets: private static final long s_nodeId =
> >> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the check()
> >> function, which is run by the SystemIntegrityChecker.
> >>
> >> Hopefully it is just a MAC Address issue, what would the
> IntegrityChecker
> >> be looking for?
> >>
> >> Thanks,
> >> Marty
> >>
> >>
> >> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
> >> darren.s.shepherd@gmail.com> wrote:
> >>
> >>> Do you have a specific error from a log?  I was not aware that
> >>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
> >>> just does "ifconfig -a" to list the devices.  By creating a bond, the
> >>> mac address CloudStack finds will probably change then I could imagine
> >>> something could possibly fail.
> >>>
> >>> Darren
> >>>
> >>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
> >>> wrote:
> >>> > Hi Guys.
> >>> >
> >>> > I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
> >>> weekend.
> >>> >
> >>> > When testing my 4.1.1 setup I ran across a problem where a TOR switch
> >>> > failure would cause an outage to the management server. The agents
> use 2
> >>> > NICs for all management traffic using bonds.
> >>> > When I tried to configure the management server to use a bond0 in
> simple
> >>> > active-passive mode (like I use for my agent management network),
> >>> > cloudstack-management would not start due to 'Integrity Issues',
> which
> >>> at
> >>> > the time I located back to a IntegitryChecker which ensures the
> >>> interfaces
> >>> > of eth* em* or some others were taking the IP of management server.
> >>> >
> >>> > My question is does this limitation still exist and if so, can it be
> >>> > overcome by adding bond* to the list of allowed interface names and
> >>> > compiling the management server from source?
> >>> > I would love to hear input to this, it seems bizarre to me that it is
> >>> > difficult to add simple but effective network redundancy to the
> >>> management
> >>> > server.
> >>> >
> >>> > For scenario basis, this is the basic redundant network setup I have
> >>> for my
> >>> > Agents:
> >>> > 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
> >>> >
> >>> > Example Host:
> >>> > ------------------Interconnect---------------
> >>> >       TOR 1      ---------      TOR 2
> >>> > ---------------------          ---------------------
> >>> >           |      Management      |
> >>> >           |     Tagged VLANs    |
> >>> > ----------------------------------------------------
> >>> >        KVM Cloudstack Hypervisor
> >>> > ----------------------------------------------------
> >>> >           |      Public Traffic         |
> >>> >           |      Tagged VLANS     |
> >>> >           |      LACP Aggregation |
> >>> > ----------------------------------------------------
> >>> >                 Core Router
> >>> > ----------------------------------------------------
> >>> >
> >>> > There are also LACP links with STP rules between the TOR switches are
> >>> the
> >>> > core device to allow for interconnect failure so the TORs do not
> become
> >>> > isolated, but I have excluded that for simplicity.
> >>> >
> >>> >
> >>> > I would have thought it would be easy to create a bond for my
> management
> >>> > node and connect the two NICs to both the TOR switches, but that
> didn't
> >>> > work in 4.1.1 due to my reasons above.
> >>> >
> >>> > Thanks!
> >>> > Marty
> >>>
> >>
> >>
>

Re: CS-Management HA Networking

Posted by Darren Shepherd <da...@gmail.com>.
Well that wasn't very useful message.  If you can find the cloud-utils
jar on your server run

java -cp <PATH>/cloud-utils-4.1.1.jar com.cloud.utils.net.MacAddress

That will output what its finding for the mac address.  Also run an
"ifconfig -a" from the command line.  If you won't mind sending the
output of "ifconfig -a" that would be helpful to see what's going
wrong.

Darren

On Tue, Oct 22, 2013 at 2:48 PM, Marty Sweet <ms...@gmail.com> wrote:
> Just noticed I didn't include the log:
>
> http://pastebin.com/wUtCsSAb
>
> Marty
>
>
> On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com> wrote:
>
>> Hi Darren,
>>
>> Maybe I'm getting confused with an issue I had with the Agents around that
>> time!
>> The error message I got was very cryptic. Having a fresh look at the
>> source code:
>>
>>
>> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
>>
>> Would suggest that it gets: private static final long s_nodeId =
>> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the check()
>> function, which is run by the SystemIntegrityChecker.
>>
>> Hopefully it is just a MAC Address issue, what would the IntegrityChecker
>> be looking for?
>>
>> Thanks,
>> Marty
>>
>>
>> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
>> darren.s.shepherd@gmail.com> wrote:
>>
>>> Do you have a specific error from a log?  I was not aware that
>>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
>>> just does "ifconfig -a" to list the devices.  By creating a bond, the
>>> mac address CloudStack finds will probably change then I could imagine
>>> something could possibly fail.
>>>
>>> Darren
>>>
>>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
>>> wrote:
>>> > Hi Guys.
>>> >
>>> > I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
>>> weekend.
>>> >
>>> > When testing my 4.1.1 setup I ran across a problem where a TOR switch
>>> > failure would cause an outage to the management server. The agents use 2
>>> > NICs for all management traffic using bonds.
>>> > When I tried to configure the management server to use a bond0 in simple
>>> > active-passive mode (like I use for my agent management network),
>>> > cloudstack-management would not start due to 'Integrity Issues', which
>>> at
>>> > the time I located back to a IntegitryChecker which ensures the
>>> interfaces
>>> > of eth* em* or some others were taking the IP of management server.
>>> >
>>> > My question is does this limitation still exist and if so, can it be
>>> > overcome by adding bond* to the list of allowed interface names and
>>> > compiling the management server from source?
>>> > I would love to hear input to this, it seems bizarre to me that it is
>>> > difficult to add simple but effective network redundancy to the
>>> management
>>> > server.
>>> >
>>> > For scenario basis, this is the basic redundant network setup I have
>>> for my
>>> > Agents:
>>> > 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>>> >
>>> > Example Host:
>>> > ------------------Interconnect---------------
>>> >       TOR 1      ---------      TOR 2
>>> > ---------------------          ---------------------
>>> >           |      Management      |
>>> >           |     Tagged VLANs    |
>>> > ----------------------------------------------------
>>> >        KVM Cloudstack Hypervisor
>>> > ----------------------------------------------------
>>> >           |      Public Traffic         |
>>> >           |      Tagged VLANS     |
>>> >           |      LACP Aggregation |
>>> > ----------------------------------------------------
>>> >                 Core Router
>>> > ----------------------------------------------------
>>> >
>>> > There are also LACP links with STP rules between the TOR switches are
>>> the
>>> > core device to allow for interconnect failure so the TORs do not become
>>> > isolated, but I have excluded that for simplicity.
>>> >
>>> >
>>> > I would have thought it would be easy to create a bond for my management
>>> > node and connect the two NICs to both the TOR switches, but that didn't
>>> > work in 4.1.1 due to my reasons above.
>>> >
>>> > Thanks!
>>> > Marty
>>>
>>
>>

Re: CS-Management HA Networking

Posted by Marty Sweet <ms...@gmail.com>.
Just noticed I didn't include the log:

http://pastebin.com/wUtCsSAb

Marty


On Tue, Oct 22, 2013 at 10:38 PM, Marty Sweet <ms...@gmail.com> wrote:

> Hi Darren,
>
> Maybe I'm getting confused with an issue I had with the Agents around that
> time!
> The error message I got was very cryptic. Having a fresh look at the
> source code:
>
>
> https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java
>
> Would suggest that it gets: private static final long s_nodeId =
> MacAddress.getMacAddress().toLong(); and ensures it's <=0 in the check()
> function, which is run by the SystemIntegrityChecker.
>
> Hopefully it is just a MAC Address issue, what would the IntegrityChecker
> be looking for?
>
> Thanks,
> Marty
>
>
> On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com> wrote:
>
>> Do you have a specific error from a log?  I was not aware that
>> CloudStack would look for interfaces w/ eth*, em*.  In the code it
>> just does "ifconfig -a" to list the devices.  By creating a bond, the
>> mac address CloudStack finds will probably change then I could imagine
>> something could possibly fail.
>>
>> Darren
>>
>> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com>
>> wrote:
>> > Hi Guys.
>> >
>> > I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
>> weekend.
>> >
>> > When testing my 4.1.1 setup I ran across a problem where a TOR switch
>> > failure would cause an outage to the management server. The agents use 2
>> > NICs for all management traffic using bonds.
>> > When I tried to configure the management server to use a bond0 in simple
>> > active-passive mode (like I use for my agent management network),
>> > cloudstack-management would not start due to 'Integrity Issues', which
>> at
>> > the time I located back to a IntegitryChecker which ensures the
>> interfaces
>> > of eth* em* or some others were taking the IP of management server.
>> >
>> > My question is does this limitation still exist and if so, can it be
>> > overcome by adding bond* to the list of allowed interface names and
>> > compiling the management server from source?
>> > I would love to hear input to this, it seems bizarre to me that it is
>> > difficult to add simple but effective network redundancy to the
>> management
>> > server.
>> >
>> > For scenario basis, this is the basic redundant network setup I have
>> for my
>> > Agents:
>> > 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>> >
>> > Example Host:
>> > ------------------Interconnect---------------
>> >       TOR 1      ---------      TOR 2
>> > ---------------------          ---------------------
>> >           |      Management      |
>> >           |     Tagged VLANs    |
>> > ----------------------------------------------------
>> >        KVM Cloudstack Hypervisor
>> > ----------------------------------------------------
>> >           |      Public Traffic         |
>> >           |      Tagged VLANS     |
>> >           |      LACP Aggregation |
>> > ----------------------------------------------------
>> >                 Core Router
>> > ----------------------------------------------------
>> >
>> > There are also LACP links with STP rules between the TOR switches are
>> the
>> > core device to allow for interconnect failure so the TORs do not become
>> > isolated, but I have excluded that for simplicity.
>> >
>> >
>> > I would have thought it would be easy to create a bond for my management
>> > node and connect the two NICs to both the TOR switches, but that didn't
>> > work in 4.1.1 due to my reasons above.
>> >
>> > Thanks!
>> > Marty
>>
>
>

Re: CS-Management HA Networking

Posted by Marty Sweet <ms...@gmail.com>.
Hi Darren,

Maybe I'm getting confused with an issue I had with the Agents around that
time!
The error message I got was very cryptic. Having a fresh look at the source
code:

https://github.com/apache/cloudstack/blob/04cdd90a84f4be5ba02778fe0cd352a4b1c39a13/utils/src/org/apache/cloudstack/utils/identity/ManagementServerNode.java

Would suggest that it gets: private static final long s_nodeId = MacAddress.
getMacAddress().toLong(); and ensures it's <=0 in the check() function,
which is run by the SystemIntegrityChecker.

Hopefully it is just a MAC Address issue, what would the IntegrityChecker
be looking for?

Thanks,
Marty

On Tue, Oct 22, 2013 at 10:02 PM, Darren Shepherd <
darren.s.shepherd@gmail.com> wrote:

> Do you have a specific error from a log?  I was not aware that
> CloudStack would look for interfaces w/ eth*, em*.  In the code it
> just does "ifconfig -a" to list the devices.  By creating a bond, the
> mac address CloudStack finds will probably change then I could imagine
> something could possibly fail.
>
> Darren
>
> On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com> wrote:
> > Hi Guys.
> >
> > I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the
> weekend.
> >
> > When testing my 4.1.1 setup I ran across a problem where a TOR switch
> > failure would cause an outage to the management server. The agents use 2
> > NICs for all management traffic using bonds.
> > When I tried to configure the management server to use a bond0 in simple
> > active-passive mode (like I use for my agent management network),
> > cloudstack-management would not start due to 'Integrity Issues', which at
> > the time I located back to a IntegitryChecker which ensures the
> interfaces
> > of eth* em* or some others were taking the IP of management server.
> >
> > My question is does this limitation still exist and if so, can it be
> > overcome by adding bond* to the list of allowed interface names and
> > compiling the management server from source?
> > I would love to hear input to this, it seems bizarre to me that it is
> > difficult to add simple but effective network redundancy to the
> management
> > server.
> >
> > For scenario basis, this is the basic redundant network setup I have for
> my
> > Agents:
> > 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
> >
> > Example Host:
> > ------------------Interconnect---------------
> >       TOR 1      ---------      TOR 2
> > ---------------------          ---------------------
> >           |      Management      |
> >           |     Tagged VLANs    |
> > ----------------------------------------------------
> >        KVM Cloudstack Hypervisor
> > ----------------------------------------------------
> >           |      Public Traffic         |
> >           |      Tagged VLANS     |
> >           |      LACP Aggregation |
> > ----------------------------------------------------
> >                 Core Router
> > ----------------------------------------------------
> >
> > There are also LACP links with STP rules between the TOR switches are the
> > core device to allow for interconnect failure so the TORs do not become
> > isolated, but I have excluded that for simplicity.
> >
> >
> > I would have thought it would be easy to create a bond for my management
> > node and connect the two NICs to both the TOR switches, but that didn't
> > work in 4.1.1 due to my reasons above.
> >
> > Thanks!
> > Marty
>

Re: CS-Management HA Networking

Posted by Darren Shepherd <da...@gmail.com>.
Do you have a specific error from a log?  I was not aware that
CloudStack would look for interfaces w/ eth*, em*.  In the code it
just does "ifconfig -a" to list the devices.  By creating a bond, the
mac address CloudStack finds will probably change then I could imagine
something could possibly fail.

Darren

On Tue, Oct 22, 2013 at 1:39 PM, Marty Sweet <ms...@gmail.com> wrote:
> Hi Guys.
>
> I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the weekend.
>
> When testing my 4.1.1 setup I ran across a problem where a TOR switch
> failure would cause an outage to the management server. The agents use 2
> NICs for all management traffic using bonds.
> When I tried to configure the management server to use a bond0 in simple
> active-passive mode (like I use for my agent management network),
> cloudstack-management would not start due to 'Integrity Issues', which at
> the time I located back to a IntegitryChecker which ensures the interfaces
> of eth* em* or some others were taking the IP of management server.
>
> My question is does this limitation still exist and if so, can it be
> overcome by adding bond* to the list of allowed interface names and
> compiling the management server from source?
> I would love to hear input to this, it seems bizarre to me that it is
> difficult to add simple but effective network redundancy to the management
> server.
>
> For scenario basis, this is the basic redundant network setup I have for my
> Agents:
> 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic)
>
> Example Host:
> ------------------Interconnect---------------
>       TOR 1      ---------      TOR 2
> ---------------------          ---------------------
>           |      Management      |
>           |     Tagged VLANs    |
> ----------------------------------------------------
>        KVM Cloudstack Hypervisor
> ----------------------------------------------------
>           |      Public Traffic         |
>           |      Tagged VLANS     |
>           |      LACP Aggregation |
> ----------------------------------------------------
>                 Core Router
> ----------------------------------------------------
>
> There are also LACP links with STP rules between the TOR switches are the
> core device to allow for interconnect failure so the TORs do not become
> isolated, but I have excluded that for simplicity.
>
>
> I would have thought it would be easy to create a bond for my management
> node and connect the two NICs to both the TOR switches, but that didn't
> work in 4.1.1 due to my reasons above.
>
> Thanks!
> Marty