You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Dean Close <de...@icloudhosting.com> on 2016/04/01 20:13:45 UTC

Redundant Router Interfaces

Hi guys,

I had been investigating a possible bug with the way interfaces are managed on virtual routers. The public interfaces are being brought up on backup routers and (because they boot second) they arp the IPs away from the master. I'd been examining an idea for a fix but whilst doing so I found that the system appears to be designed to bring up these interfaces.

I suspect that a few things need to be reworked - but the changes necessary go so far against what has been implemented that I wanted to open this up before doing the work.

Hopefully if I go through my findings you guys can help me see what I might be getting wrong.

The following was correct for pre-4.6 redundant routers:
 
  1. Both routers get configured with IP addresses, routes and iptables rules.
  2. Public interfaces are initially set as DOWN.
  3. Keepalived runs a VRRP instance on the private interface (eth0) to negotiate MASTER/BACKUP roles.
  4. Keepalived manages the virtual IP on eth0 used as the public gateway for the guest VMs.
  5. Keepalived uses a master notify script to bring up the public interfaces.

The above was true for pre-4.6 routers. Now, however, things appear to work differently:

  1. Both routers get configured as before.
  2. All interfaces apart from eth1 (the Hypervisor-link interface) are set as UP.
  3. Keepalived runs a VRRP instance on the first public interface (eth2) to negotiate MASTER/BACKUP roles.
  4. Keepalived manages the virtual IP as before.
  5. Keepalived uses a master notify script to bring up the public interfaces (unnecessary)
  6. Keepalived uses a backup notify script to bring down the public interfaces (unused)

This is unexpected for the following reasons:

  1. The keepalived notify script brings the public interfaces down when transitioning to BACKUP - so how can we expect to run a VRRP instance over eth2?
  2. If interfaces are down when transitioning to BACKUP, why are they not expected to be down to begin with? (Before the router has become MASTER)
  3. Why are we running a VRRP instance over an interface with an IP that will clash with another host on the network?

The following method from the CsIP class in /opt/cloud/bin/cs/CsAddress.py confuses matters futher:

    def check_is_up(self):
         """ Ensure device is up """
         cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
         for i in CsHelper.execute(cmd):
             if " DOWN " in i:
                 cmd2 = "ip link set %s up" % self.getDevice()
                 # If redundant only bring up public interfaces that are not eth1.
                 # Reason: private gateways are public interfaces.
                 # master.py and keepalived will deal with eth1 public interface. 
                 if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not in PUBLIC_INTERFACE):
                     CsHelper.execute(cmd2)
                 # if not redundant bring everything up
                 if not self.cl.is_redundant():
                     CsHelper.execute(cmd2)
 
The comments refer to eth1 as a public interface when this is the link to the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1']. But keepalived and master.py don't influence eth1 at all. This looks like a mistake.

Lastly, the logic of this line looks flawed:

  if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not in PUBLIC_INTERFACE)

As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will be ignored. Public IPs will never be assigned to eth1, so this line evaluates as:

  
  if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)

which reduces even further to:

  if self.cs.is_redundant() and self.is_control()


What would need doing
---------------------

  1. The keepalived.conf template would need to be changed to run the VRRP instance over eth0.
  2. The check_is_up method of the CsIP class should be renamed to 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass is_public or needs_vrrp.
  3. The arpPing method should do nothing if the interface is down.
  4. The PUBLIC_INTERFACE constant should be either renamed or dropped altogether.
  5. Other things that I haven't considered?


I'd really appreciate any feedback on this. It's possible that I've got it all wrong but I'm suspecting not. I just don't want to tread on anyone's toes by submitting a PR that goes against what appears to be an explicit design decision.


Kind regards,

Dean Close
iCloudHosting.com
http://www.icloudhosting.com
Tel: 01582 227927

Unit 2, Smallmead Road, Reading RG2 0QS

******************************************************************
The names iCloudHosting and iCloudHosting.com are trading styles of BBS Commerce Ltd which is registered in England and Wales, Company Number 04837714. Please use our trading address above for mail. Our registered office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it.

iCloudHosting accepts no responsibility for information, errors or omissions in this email.
******************************************************************


RE: Redundant Router Interfaces

Posted by Dean Close <de...@icloudhosting.com>.
Hi Wei,

Have you noticed the PR #1413 at https://github.com/apache/cloudstack/pull/1413 ? The commit at https://github.com/apache/cloudstack/commit/8bbea5eeb6598b213b78a324c184841dbba69280 changes the check_is_up method considerably such that your patch would no-longer apply.

That PR #1413 focuses on VPC router functionality but it raises a number of other concerns for normal RvR routers:


  1. We mentioned the conflict between the code in your patch and the commit at 8bbea5
  2. Redundant routers can have multiple public interfaces. Eth2 is just the public interface where IPs on the same subnet as the source-NAT IP are assigned. When a public IP on a different subnet is assigned to a router it creates a new interface for that subnet, eth3, eth4, etc. The commit 11e61f7 in PR #1413 does not account for this.
  3. If we solve the issue with check_is_up not bringing up public interfaces, we need to consider that public interfaces may be created on RvRs after they have become master. Because the keepalived notify script will not be executed (no transition) these new interfaces will remain DOWN until a failover is triggered. From the user's perspective, these newly assigned IPs just won't work until they restart their network.
  4. Once the interface is brought up IP addresses assigned to it still won't work. Two IP tables rules are missing for the new interface to allow inbound traffic. Once those rules are added, inbound static-NAT and port-forwarding traffic works. Outbound static-NAT, however, does not work and gets NAT'd to the source-NAT IP. This is because of the connmark given to any traffic from the guest network. This connmark matches an IP rule trying to route such traffic over eth0. Public IP traffic can't go via eth0, however, so it is routed via the default gateway, via the source-NAT IP, on eth2.


I'd like to help with points 3 and 4 but it's not clear to me which version of the code should be considered 'current'. Have you noticed these issues being discussed elsewhere?

Dean


-----Original Message-----
From: Dean Close [mailto:dean.close@icloudhosting.com] 
Sent: 04 April 2016 14:36
To: dev@cloudstack.apache.org
Subject: RE: Redundant Router Interfaces

Hi Wei,

I implemented your modifications on a test cloud we have here and it worked well. Redundant routers in the backup state booted with public interfaces down.

As you said, they also negotiated the VRRP instance over eth0 which is correct.

As an aside, regarding the following code:

	if self.cl.is_redundant() and (not self.is_public() or (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):

I expanded this out to help me make sense of the logic. I also disposed of the preceding comment block as it was misleading in non-VPC contexts. The full block I'm running here is:

                # if not redundant bring everything up
                if not self.cl.is_redundant():
                    CsHelper.execute(cmd2)
                # Otherwise only bring up non-public interfaces
                elif self.cl.is_redundant() and self.config.is_vpc() and \
                    (self.getDevice() not in VPC_PUBLIC_INTERFACE):
                    CsHelper.execute(cmd2)
                elif self.cl.is_redundant() and not self.is_public():
                    CsHelper.execute(cmd2)


Dean

-----Original Message-----
From: Wei ZHOU [mailto:ustcweizhou@gmail.com]
Sent: 02 April 2016 07:02
To: dev@cloudstack.apache.org
Subject: Re: Redundant Router Interfaces

Hi Dean,

the VRRP instance will be set to eth0 (guest device) in router, and eth2 (the first guest device ) in vpc router. I think it is good.

-Wei

2016-04-02 0:22 GMT+02:00 Dean Close <de...@icloudhosting.com>:

> Hi Wei,
>
> It looks like your patch will prevent redundant routers from bringing 
> up public interfaces. We also need to:
>
> 1. shift the VRRP instance over to eth0 as eth2 will be down on the backup.
> 2. Ensure that eth0 and eth1 are up on both redundant virtual routers.
>
> Your patch might already solve point 2. What do you think?
>
> Point 1 will require modifying the keepalived.conf template. Do you 
> want to handle this in your PR or shall I arrange that?
>
> Kind Regards,
>
> Dean Close
> iCloudHosting.com
> http://www.icloudhosting.com
> Tel: 01582 227927
>
> Spectrum House, Dunstable Road, Redbourn, AL3 7PR
>
>
> ******************************************************************
>
> The names iCloudHosting and iCloudHosting.com are trading styles of 
> BBS Commerce Ltd which is registered in England and Wales, Company 
> Number 04837714. Please use our trading address above for mail. Our 
> registered office is 5 Theale Lakes Business Park, Moulden Way, 
> Sulhamstead, Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>
> This email and any files transmitted with it are confidential and 
> intended solely for the use of the individual or entity to whom they are addressed.
> If you are not the intended recipient you are not authorised to and 
> must not disclose, copy, distribute, or retain this message or any part of it.
>
> iCloudHosting accepts no responsibility for information, errors or 
> omissions in this email.
>
> ******************************************************************
>
>
>
>
> On Fri, Apr 1, 2016 at 8:34 pm, Wei ZHOU <us...@gmail.com> wrote:
> actually not only this issue, but more
> (1) restart vpc router will break the network if there are multiple 
> tiers in the vpc.
> (2) check_heartbeat.sh does not work.
>
> I will create tickets for them next Monday, and create PR on github.
>
> -Wei
>
> 2016-04-01 21:14 GMT+02:00 Wei ZHOU <us...@gmail.com>:
>
> > Dean,
> >
> > I just fixed it yesterday.
> >
> > the commit is
> >
> > ---
> >
> > diff --git
> > a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > index 5f63c06..5256d03 100755
> > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > @@ -27,7 +27,7 @@ from CsRoute import CsRoute from CsRule import 
> > CsRule
> >
> > VRRP_TYPES = ['guest']
> > -PUBLIC_INTERFACE = ['eth1']
> > +VPC_PUBLIC_INTERFACE = ['eth1']
> >
> > class CsAddress(CsDataBag):
> >
> > @@ -323,7 +323,7 @@ class CsIP:
> > # If redundant only bring up public interfaces that are not eth1.
> > # Reason: private gateways are public interfaces.
> > # master.py and keepalived will deal with eth1 public interface.
> > - if self.cl.is_redundant() and (not self.is_public() or
> > self.getDevice() not in PUBLIC_INTERFACE):
> > + if self.cl.is_redundant() and (not self.is_public() or
> > (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
> > CsHelper.execute(cmd2)
> > # if not redundant bring everything up if not
> > self.cl.is_redundant():
> >
> > ---
> >
> > - Wei
> >
> > 2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:
> >
> >> Hi guys,
> >>
> >> I had been investigating a possible bug with the way interfaces are 
> >> managed on virtual routers. The public interfaces are being brought 
> >> up
> on
> >> backup routers and (because they boot second) they arp the IPs away 
> >> from the master. I'd been examining an idea for a fix but whilst 
> >> doing so I found that the system appears to be designed to bring up 
> >> these
> interfaces.
> >>
> >> I suspect that a few things need to be reworked - but the changes 
> >> necessary go so far against what has been implemented that I wanted 
> >> to
> open
> >> this up before doing the work.
> >>
> >> Hopefully if I go through my findings you guys can help me see what 
> >> I might be getting wrong.
> >>
> >> The following was correct for pre-4.6 redundant routers:
> >>
> >> 1. Both routers get configured with IP addresses, routes and 
> >> iptables rules.
> >> 2. Public interfaces are initially set as DOWN.
> >> 3. Keepalived runs a VRRP instance on the private interface (eth0) 
> >> to negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP on eth0 used as the public 
> >> gateway for the guest VMs.
> >> 5. Keepalived uses a master notify script to bring up the public 
> >> interfaces.
> >>
> >> The above was true for pre-4.6 routers. Now, however, things appear 
> >> to work differently:
> >>
> >> 1. Both routers get configured as before.
> >> 2. All interfaces apart from eth1 (the Hypervisor-link interface) 
> >> are set as UP.
> >> 3. Keepalived runs a VRRP instance on the first public interface
> >> (eth2) to negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP as before.
> >> 5. Keepalived uses a master notify script to bring up the public 
> >> interfaces (unnecessary) 6. Keepalived uses a backup notify script 
> >> to bring down the public interfaces (unused)
> >>
> >> This is unexpected for the following reasons:
> >>
> >> 1. The keepalived notify script brings the public interfaces down 
> >> when transitioning to BACKUP - so how can we expect to run a VRRP 
> >> instance
> over
> >> eth2?
> >> 2. If interfaces are down when transitioning to BACKUP, why are 
> >> they not expected to be down to begin with? (Before the router has 
> >> become
> MASTER)
> >> 3. Why are we running a VRRP instance over an interface with an IP 
> >> that will clash with another host on the network?
> >>
> >> The following method from the CsIP class in 
> >> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
> >>
> >> def check_is_up(self):
> >> """ Ensure device is up """
> >> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() for 
> >> i in CsHelper.execute(cmd):
> >> if " DOWN " in i:
> >> cmd2 = "ip link set %s up" % self.getDevice() # If redundant only 
> >> bring up public interfaces that are not eth1.
> >> # Reason: private gateways are public interfaces.
> >> # master.py and keepalived will deal with eth1 public interface.
> >> if self.cl.is_redundant() and (not self.is_public() or
> >> self.getDevice() not in PUBLIC_INTERFACE):
> >> CsHelper.execute(cmd2)
> >> # if not redundant bring everything up if not
> >> self.cl.is_redundant():
> >> CsHelper.execute(cmd2)
> >>
> >> The comments refer to eth1 as a public interface when this is the 
> >> link
> to
> >> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as
> ['eth1'].
> >> But keepalived and master.py don't influence eth1 at all. This 
> >> looks
> like a
> >> mistake.
> >>
> >> Lastly, the logic of this line looks flawed:
> >>
> >> if self.cl.is_redundant() and (not self.is_public() or
> >> self.getDevice() not in PUBLIC_INTERFACE)
> >>
> >> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` 
> >> will be ignored. Public IPs will never be assigned to eth1, so this 
> >> line evaluates as:
> >>
> >>
> >> if self.cl.is_redundant() and (self.getDevice() not in
> >> PUBLIC_INTERFACE)
> >>
> >> which reduces even further to:
> >>
> >> if self.cs.is_redundant() and self.is_control()
> >>
> >>
> >> What would need doing
> >> ---------------------
> >>
> >> 1. The keepalived.conf template would need to be changed to run the 
> >> VRRP instance over eth0.
> >> 2. The check_is_up method of the CsIP class should be renamed to 
> >> 'bring_up_interfaces'. For redundant routers it should ignore IPs 
> >> that
> pass
> >> is_public or needs_vrrp.
> >> 3. The arpPing method should do nothing if the interface is down.
> >> 4. The PUBLIC_INTERFACE constant should be either renamed or 
> >> dropped altogether.
> >> 5. Other things that I haven't considered?
> >>
> >>
> >> I'd really appreciate any feedback on this. It's possible that I've 
> >> got it all wrong but I'm suspecting not. I just don't want to tread 
> >> on
> anyone's
> >> toes by submitting a PR that goes against what appears to be an 
> >> explicit design decision.
> >>
> >>
> >> Kind regards,
> >>
> >> Dean Close
> >> iCloudHosting.com
> >> http://www.icloudhosting.com
> >> Tel: 01582 227927
> >>
> >> Unit 2, Smallmead Road, Reading RG2 0QS
> >>
> >> ******************************************************************
> >> The names iCloudHosting and iCloudHosting.com are trading styles of 
> >> BBS Commerce Ltd which is registered in England and Wales, Company 
> >> Number 04837714. Please use our trading address above for mail. Our 
> >> registered office is 5 Theale Lakes Business Park, Moulden Way, 
> >> Sulhamstead,
> Reading,
> >> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
> >>
> >> This email and any files transmitted with it are confidential and 
> >> intended solely for the use of the individual or entity to whom 
> >> they are addressed. If you are not the intended recipient you are 
> >> not authorised
> to
> >> and must not disclose, copy, distribute, or retain this message or 
> >> any
> part
> >> of it.
> >>
> >> iCloudHosting accepts no responsibility for information, errors or 
> >> omissions in this email.
> >> ******************************************************************
> >>
> >>
> >
>

RE: Redundant Router Interfaces

Posted by Dean Close <de...@icloudhosting.com>.
Hi Wei,

I implemented your modifications on a test cloud we have here and it worked well. Redundant routers in the backup state booted with public interfaces down.

As you said, they also negotiated the VRRP instance over eth0 which is correct.

As an aside, regarding the following code:

	if self.cl.is_redundant() and (not self.is_public() or (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):

I expanded this out to help me make sense of the logic. I also disposed of the preceding comment block as it was misleading in non-VPC contexts. The full block I'm running here is:

                # if not redundant bring everything up
                if not self.cl.is_redundant():
                    CsHelper.execute(cmd2)
                # Otherwise only bring up non-public interfaces
                elif self.cl.is_redundant() and self.config.is_vpc() and \
                    (self.getDevice() not in VPC_PUBLIC_INTERFACE):
                    CsHelper.execute(cmd2)
                elif self.cl.is_redundant() and not self.is_public():
                    CsHelper.execute(cmd2)


Dean

-----Original Message-----
From: Wei ZHOU [mailto:ustcweizhou@gmail.com] 
Sent: 02 April 2016 07:02
To: dev@cloudstack.apache.org
Subject: Re: Redundant Router Interfaces

Hi Dean,

the VRRP instance will be set to eth0 (guest device) in router, and eth2 (the first guest device ) in vpc router. I think it is good.

-Wei

2016-04-02 0:22 GMT+02:00 Dean Close <de...@icloudhosting.com>:

> Hi Wei,
>
> It looks like your patch will prevent redundant routers from bringing 
> up public interfaces. We also need to:
>
> 1. shift the VRRP instance over to eth0 as eth2 will be down on the backup.
> 2. Ensure that eth0 and eth1 are up on both redundant virtual routers.
>
> Your patch might already solve point 2. What do you think?
>
> Point 1 will require modifying the keepalived.conf template. Do you 
> want to handle this in your PR or shall I arrange that?
>
> Kind Regards,
>
> Dean Close
> iCloudHosting.com
> http://www.icloudhosting.com
> Tel: 01582 227927
>
> Spectrum House, Dunstable Road, Redbourn, AL3 7PR
>
>
> ******************************************************************
>
> The names iCloudHosting and iCloudHosting.com are trading styles of 
> BBS Commerce Ltd which is registered in England and Wales, Company 
> Number 04837714. Please use our trading address above for mail. Our 
> registered office is 5 Theale Lakes Business Park, Moulden Way, 
> Sulhamstead, Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>
> This email and any files transmitted with it are confidential and 
> intended solely for the use of the individual or entity to whom they are addressed.
> If you are not the intended recipient you are not authorised to and 
> must not disclose, copy, distribute, or retain this message or any part of it.
>
> iCloudHosting accepts no responsibility for information, errors or 
> omissions in this email.
>
> ******************************************************************
>
>
>
>
> On Fri, Apr 1, 2016 at 8:34 pm, Wei ZHOU <us...@gmail.com> wrote:
> actually not only this issue, but more
> (1) restart vpc router will break the network if there are multiple 
> tiers in the vpc.
> (2) check_heartbeat.sh does not work.
>
> I will create tickets for them next Monday, and create PR on github.
>
> -Wei
>
> 2016-04-01 21:14 GMT+02:00 Wei ZHOU <us...@gmail.com>:
>
> > Dean,
> >
> > I just fixed it yesterday.
> >
> > the commit is
> >
> > ---
> >
> > diff --git 
> > a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > index 5f63c06..5256d03 100755
> > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > @@ -27,7 +27,7 @@ from CsRoute import CsRoute from CsRule import 
> > CsRule
> >
> > VRRP_TYPES = ['guest']
> > -PUBLIC_INTERFACE = ['eth1']
> > +VPC_PUBLIC_INTERFACE = ['eth1']
> >
> > class CsAddress(CsDataBag):
> >
> > @@ -323,7 +323,7 @@ class CsIP:
> > # If redundant only bring up public interfaces that are not eth1.
> > # Reason: private gateways are public interfaces.
> > # master.py and keepalived will deal with eth1 public interface.
> > - if self.cl.is_redundant() and (not self.is_public() or
> > self.getDevice() not in PUBLIC_INTERFACE):
> > + if self.cl.is_redundant() and (not self.is_public() or
> > (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
> > CsHelper.execute(cmd2)
> > # if not redundant bring everything up if not 
> > self.cl.is_redundant():
> >
> > ---
> >
> > - Wei
> >
> > 2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:
> >
> >> Hi guys,
> >>
> >> I had been investigating a possible bug with the way interfaces are 
> >> managed on virtual routers. The public interfaces are being brought 
> >> up
> on
> >> backup routers and (because they boot second) they arp the IPs away 
> >> from the master. I'd been examining an idea for a fix but whilst 
> >> doing so I found that the system appears to be designed to bring up 
> >> these
> interfaces.
> >>
> >> I suspect that a few things need to be reworked - but the changes 
> >> necessary go so far against what has been implemented that I wanted 
> >> to
> open
> >> this up before doing the work.
> >>
> >> Hopefully if I go through my findings you guys can help me see what 
> >> I might be getting wrong.
> >>
> >> The following was correct for pre-4.6 redundant routers:
> >>
> >> 1. Both routers get configured with IP addresses, routes and 
> >> iptables rules.
> >> 2. Public interfaces are initially set as DOWN.
> >> 3. Keepalived runs a VRRP instance on the private interface (eth0) 
> >> to negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP on eth0 used as the public 
> >> gateway for the guest VMs.
> >> 5. Keepalived uses a master notify script to bring up the public 
> >> interfaces.
> >>
> >> The above was true for pre-4.6 routers. Now, however, things appear 
> >> to work differently:
> >>
> >> 1. Both routers get configured as before.
> >> 2. All interfaces apart from eth1 (the Hypervisor-link interface) 
> >> are set as UP.
> >> 3. Keepalived runs a VRRP instance on the first public interface 
> >> (eth2) to negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP as before.
> >> 5. Keepalived uses a master notify script to bring up the public 
> >> interfaces (unnecessary) 6. Keepalived uses a backup notify script 
> >> to bring down the public interfaces (unused)
> >>
> >> This is unexpected for the following reasons:
> >>
> >> 1. The keepalived notify script brings the public interfaces down 
> >> when transitioning to BACKUP - so how can we expect to run a VRRP 
> >> instance
> over
> >> eth2?
> >> 2. If interfaces are down when transitioning to BACKUP, why are 
> >> they not expected to be down to begin with? (Before the router has 
> >> become
> MASTER)
> >> 3. Why are we running a VRRP instance over an interface with an IP 
> >> that will clash with another host on the network?
> >>
> >> The following method from the CsIP class in 
> >> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
> >>
> >> def check_is_up(self):
> >> """ Ensure device is up """
> >> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() for 
> >> i in CsHelper.execute(cmd):
> >> if " DOWN " in i:
> >> cmd2 = "ip link set %s up" % self.getDevice() # If redundant only 
> >> bring up public interfaces that are not eth1.
> >> # Reason: private gateways are public interfaces.
> >> # master.py and keepalived will deal with eth1 public interface.
> >> if self.cl.is_redundant() and (not self.is_public() or
> >> self.getDevice() not in PUBLIC_INTERFACE):
> >> CsHelper.execute(cmd2)
> >> # if not redundant bring everything up if not 
> >> self.cl.is_redundant():
> >> CsHelper.execute(cmd2)
> >>
> >> The comments refer to eth1 as a public interface when this is the 
> >> link
> to
> >> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as
> ['eth1'].
> >> But keepalived and master.py don't influence eth1 at all. This 
> >> looks
> like a
> >> mistake.
> >>
> >> Lastly, the logic of this line looks flawed:
> >>
> >> if self.cl.is_redundant() and (not self.is_public() or 
> >> self.getDevice() not in PUBLIC_INTERFACE)
> >>
> >> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` 
> >> will be ignored. Public IPs will never be assigned to eth1, so this 
> >> line evaluates as:
> >>
> >>
> >> if self.cl.is_redundant() and (self.getDevice() not in 
> >> PUBLIC_INTERFACE)
> >>
> >> which reduces even further to:
> >>
> >> if self.cs.is_redundant() and self.is_control()
> >>
> >>
> >> What would need doing
> >> ---------------------
> >>
> >> 1. The keepalived.conf template would need to be changed to run the 
> >> VRRP instance over eth0.
> >> 2. The check_is_up method of the CsIP class should be renamed to 
> >> 'bring_up_interfaces'. For redundant routers it should ignore IPs 
> >> that
> pass
> >> is_public or needs_vrrp.
> >> 3. The arpPing method should do nothing if the interface is down.
> >> 4. The PUBLIC_INTERFACE constant should be either renamed or 
> >> dropped altogether.
> >> 5. Other things that I haven't considered?
> >>
> >>
> >> I'd really appreciate any feedback on this. It's possible that I've 
> >> got it all wrong but I'm suspecting not. I just don't want to tread 
> >> on
> anyone's
> >> toes by submitting a PR that goes against what appears to be an 
> >> explicit design decision.
> >>
> >>
> >> Kind regards,
> >>
> >> Dean Close
> >> iCloudHosting.com
> >> http://www.icloudhosting.com
> >> Tel: 01582 227927
> >>
> >> Unit 2, Smallmead Road, Reading RG2 0QS
> >>
> >> ******************************************************************
> >> The names iCloudHosting and iCloudHosting.com are trading styles of 
> >> BBS Commerce Ltd which is registered in England and Wales, Company 
> >> Number 04837714. Please use our trading address above for mail. Our 
> >> registered office is 5 Theale Lakes Business Park, Moulden Way, 
> >> Sulhamstead,
> Reading,
> >> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
> >>
> >> This email and any files transmitted with it are confidential and 
> >> intended solely for the use of the individual or entity to whom 
> >> they are addressed. If you are not the intended recipient you are 
> >> not authorised
> to
> >> and must not disclose, copy, distribute, or retain this message or 
> >> any
> part
> >> of it.
> >>
> >> iCloudHosting accepts no responsibility for information, errors or 
> >> omissions in this email.
> >> ******************************************************************
> >>
> >>
> >
>

Re: Redundant Router Interfaces

Posted by Wei ZHOU <us...@gmail.com>.
Hi Dean,

the VRRP instance will be set to eth0 (guest device) in router, and eth2
(the first guest device ) in vpc router. I think it is good.

-Wei

2016-04-02 0:22 GMT+02:00 Dean Close <de...@icloudhosting.com>:

> Hi Wei,
>
> It looks like your patch will prevent redundant routers from bringing up
> public interfaces. We also need to:
>
> 1. shift the VRRP instance over to eth0 as eth2 will be down on the backup.
> 2. Ensure that eth0 and eth1 are up on both redundant virtual routers.
>
> Your patch might already solve point 2. What do you think?
>
> Point 1 will require modifying the keepalived.conf template. Do you want
> to handle this in your PR or shall I arrange that?
>
> Kind Regards,
>
> Dean Close
> iCloudHosting.com
> http://www.icloudhosting.com
> Tel: 01582 227927
>
> Spectrum House, Dunstable Road, Redbourn, AL3 7PR
>
>
> ******************************************************************
>
> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
> Commerce Ltd which is registered in England and Wales, Company Number
> 04837714. Please use our trading address above for mail. Our registered
> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading,
> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you are not the intended recipient you are not authorised to and must
> not disclose, copy, distribute, or retain this message or any part of it.
>
> iCloudHosting accepts no responsibility for information, errors or
> omissions in this email.
>
> ******************************************************************
>
>
>
>
> On Fri, Apr 1, 2016 at 8:34 pm, Wei ZHOU <us...@gmail.com> wrote:
> actually not only this issue, but more
> (1) restart vpc router will break the network if there are multiple tiers
> in the vpc.
> (2) check_heartbeat.sh does not work.
>
> I will create tickets for them next Monday, and create PR on github.
>
> -Wei
>
> 2016-04-01 21:14 GMT+02:00 Wei ZHOU <us...@gmail.com>:
>
> > Dean,
> >
> > I just fixed it yesterday.
> >
> > the commit is
> >
> > ---
> >
> > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > index 5f63c06..5256d03 100755
> > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> > @@ -27,7 +27,7 @@ from CsRoute import CsRoute
> > from CsRule import CsRule
> >
> > VRRP_TYPES = ['guest']
> > -PUBLIC_INTERFACE = ['eth1']
> > +VPC_PUBLIC_INTERFACE = ['eth1']
> >
> > class CsAddress(CsDataBag):
> >
> > @@ -323,7 +323,7 @@ class CsIP:
> > # If redundant only bring up public interfaces that are
> > not eth1.
> > # Reason: private gateways are public interfaces.
> > # master.py and keepalived will deal with eth1 public
> > interface.
> > - if self.cl.is_redundant() and (not self.is_public() or
> > self.getDevice() not in PUBLIC_INTERFACE):
> > + if self.cl.is_redundant() and (not self.is_public() or
> > (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
> > CsHelper.execute(cmd2)
> > # if not redundant bring everything up
> > if not self.cl.is_redundant():
> >
> > ---
> >
> > - Wei
> >
> > 2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:
> >
> >> Hi guys,
> >>
> >> I had been investigating a possible bug with the way interfaces are
> >> managed on virtual routers. The public interfaces are being brought up
> on
> >> backup routers and (because they boot second) they arp the IPs away from
> >> the master. I'd been examining an idea for a fix but whilst doing so I
> >> found that the system appears to be designed to bring up these
> interfaces.
> >>
> >> I suspect that a few things need to be reworked - but the changes
> >> necessary go so far against what has been implemented that I wanted to
> open
> >> this up before doing the work.
> >>
> >> Hopefully if I go through my findings you guys can help me see what I
> >> might be getting wrong.
> >>
> >> The following was correct for pre-4.6 redundant routers:
> >>
> >> 1. Both routers get configured with IP addresses, routes and iptables
> >> rules.
> >> 2. Public interfaces are initially set as DOWN.
> >> 3. Keepalived runs a VRRP instance on the private interface (eth0) to
> >> negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP on eth0 used as the public gateway
> >> for the guest VMs.
> >> 5. Keepalived uses a master notify script to bring up the public
> >> interfaces.
> >>
> >> The above was true for pre-4.6 routers. Now, however, things appear to
> >> work differently:
> >>
> >> 1. Both routers get configured as before.
> >> 2. All interfaces apart from eth1 (the Hypervisor-link interface) are
> >> set as UP.
> >> 3. Keepalived runs a VRRP instance on the first public interface (eth2)
> >> to negotiate MASTER/BACKUP roles.
> >> 4. Keepalived manages the virtual IP as before.
> >> 5. Keepalived uses a master notify script to bring up the public
> >> interfaces (unnecessary)
> >> 6. Keepalived uses a backup notify script to bring down the public
> >> interfaces (unused)
> >>
> >> This is unexpected for the following reasons:
> >>
> >> 1. The keepalived notify script brings the public interfaces down when
> >> transitioning to BACKUP - so how can we expect to run a VRRP instance
> over
> >> eth2?
> >> 2. If interfaces are down when transitioning to BACKUP, why are they
> >> not expected to be down to begin with? (Before the router has become
> MASTER)
> >> 3. Why are we running a VRRP instance over an interface with an IP that
> >> will clash with another host on the network?
> >>
> >> The following method from the CsIP class in
> >> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
> >>
> >> def check_is_up(self):
> >> """ Ensure device is up """
> >> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
> >> for i in CsHelper.execute(cmd):
> >> if " DOWN " in i:
> >> cmd2 = "ip link set %s up" % self.getDevice()
> >> # If redundant only bring up public interfaces that are
> >> not eth1.
> >> # Reason: private gateways are public interfaces.
> >> # master.py and keepalived will deal with eth1 public
> >> interface.
> >> if self.cl.is_redundant() and (not self.is_public() or
> >> self.getDevice() not in PUBLIC_INTERFACE):
> >> CsHelper.execute(cmd2)
> >> # if not redundant bring everything up
> >> if not self.cl.is_redundant():
> >> CsHelper.execute(cmd2)
> >>
> >> The comments refer to eth1 as a public interface when this is the link
> to
> >> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as
> ['eth1'].
> >> But keepalived and master.py don't influence eth1 at all. This looks
> like a
> >> mistake.
> >>
> >> Lastly, the logic of this line looks flawed:
> >>
> >> if self.cl.is_redundant() and (not self.is_public() or self.getDevice()
> >> not in PUBLIC_INTERFACE)
> >>
> >> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will
> >> be ignored. Public IPs will never be assigned to eth1, so this line
> >> evaluates as:
> >>
> >>
> >> if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)
> >>
> >> which reduces even further to:
> >>
> >> if self.cs.is_redundant() and self.is_control()
> >>
> >>
> >> What would need doing
> >> ---------------------
> >>
> >> 1. The keepalived.conf template would need to be changed to run the
> >> VRRP instance over eth0.
> >> 2. The check_is_up method of the CsIP class should be renamed to
> >> 'bring_up_interfaces'. For redundant routers it should ignore IPs that
> pass
> >> is_public or needs_vrrp.
> >> 3. The arpPing method should do nothing if the interface is down.
> >> 4. The PUBLIC_INTERFACE constant should be either renamed or dropped
> >> altogether.
> >> 5. Other things that I haven't considered?
> >>
> >>
> >> I'd really appreciate any feedback on this. It's possible that I've got
> >> it all wrong but I'm suspecting not. I just don't want to tread on
> anyone's
> >> toes by submitting a PR that goes against what appears to be an explicit
> >> design decision.
> >>
> >>
> >> Kind regards,
> >>
> >> Dean Close
> >> iCloudHosting.com
> >> http://www.icloudhosting.com
> >> Tel: 01582 227927
> >>
> >> Unit 2, Smallmead Road, Reading RG2 0QS
> >>
> >> ******************************************************************
> >> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
> >> Commerce Ltd which is registered in England and Wales, Company Number
> >> 04837714. Please use our trading address above for mail. Our registered
> >> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead,
> Reading,
> >> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
> >>
> >> This email and any files transmitted with it are confidential and
> >> intended solely for the use of the individual or entity to whom they are
> >> addressed. If you are not the intended recipient you are not authorised
> to
> >> and must not disclose, copy, distribute, or retain this message or any
> part
> >> of it.
> >>
> >> iCloudHosting accepts no responsibility for information, errors or
> >> omissions in this email.
> >> ******************************************************************
> >>
> >>
> >
>

Re: Redundant Router Interfaces

Posted by Dean Close <de...@icloudhosting.com>.
Hi Wei,

It looks like your patch will prevent redundant routers from bringing up public interfaces. We also need to:

1. shift the VRRP instance over to eth0 as eth2 will be down on the backup.
2. Ensure that eth0 and eth1 are up on both redundant virtual routers.

Your patch might already solve point 2. What do you think?

Point 1 will require modifying the keepalived.conf template. Do you want to handle this in your PR or shall I arrange that?

Kind Regards,

Dean Close
iCloudHosting.com
http://www.icloudhosting.com
Tel: 01582 227927

Spectrum House, Dunstable Road, Redbourn, AL3 7PR


******************************************************************

The names iCloudHosting and iCloudHosting.com are trading styles of BBS Commerce Ltd which is registered in England and Wales, Company Number 04837714. Please use our trading address above for mail. Our registered office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it.

iCloudHosting accepts no responsibility for information, errors or omissions in this email.

******************************************************************




On Fri, Apr 1, 2016 at 8:34 pm, Wei ZHOU <us...@gmail.com> wrote:
actually not only this issue, but more
(1) restart vpc router will break the network if there are multiple tiers
in the vpc.
(2) check_heartbeat.sh does not work.

I will create tickets for them next Monday, and create PR on github.

-Wei

2016-04-01 21:14 GMT+02:00 Wei ZHOU <us...@gmail.com>:

> Dean,
>
> I just fixed it yesterday.
>
> the commit is
>
> ---
>
> diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> index 5f63c06..5256d03 100755
> --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> @@ -27,7 +27,7 @@ from CsRoute import CsRoute
> from CsRule import CsRule
>
> VRRP_TYPES = ['guest']
> -PUBLIC_INTERFACE = ['eth1']
> +VPC_PUBLIC_INTERFACE = ['eth1']
>
> class CsAddress(CsDataBag):
>
> @@ -323,7 +323,7 @@ class CsIP:
> # If redundant only bring up public interfaces that are
> not eth1.
> # Reason: private gateways are public interfaces.
> # master.py and keepalived will deal with eth1 public
> interface.
> - if self.cl.is_redundant() and (not self.is_public() or
> self.getDevice() not in PUBLIC_INTERFACE):
> + if self.cl.is_redundant() and (not self.is_public() or
> (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
> CsHelper.execute(cmd2)
> # if not redundant bring everything up
> if not self.cl.is_redundant():
>
> ---
>
> - Wei
>
> 2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:
>
>> Hi guys,
>>
>> I had been investigating a possible bug with the way interfaces are
>> managed on virtual routers. The public interfaces are being brought up on
>> backup routers and (because they boot second) they arp the IPs away from
>> the master. I'd been examining an idea for a fix but whilst doing so I
>> found that the system appears to be designed to bring up these interfaces.
>>
>> I suspect that a few things need to be reworked - but the changes
>> necessary go so far against what has been implemented that I wanted to open
>> this up before doing the work.
>>
>> Hopefully if I go through my findings you guys can help me see what I
>> might be getting wrong.
>>
>> The following was correct for pre-4.6 redundant routers:
>>
>> 1. Both routers get configured with IP addresses, routes and iptables
>> rules.
>> 2. Public interfaces are initially set as DOWN.
>> 3. Keepalived runs a VRRP instance on the private interface (eth0) to
>> negotiate MASTER/BACKUP roles.
>> 4. Keepalived manages the virtual IP on eth0 used as the public gateway
>> for the guest VMs.
>> 5. Keepalived uses a master notify script to bring up the public
>> interfaces.
>>
>> The above was true for pre-4.6 routers. Now, however, things appear to
>> work differently:
>>
>> 1. Both routers get configured as before.
>> 2. All interfaces apart from eth1 (the Hypervisor-link interface) are
>> set as UP.
>> 3. Keepalived runs a VRRP instance on the first public interface (eth2)
>> to negotiate MASTER/BACKUP roles.
>> 4. Keepalived manages the virtual IP as before.
>> 5. Keepalived uses a master notify script to bring up the public
>> interfaces (unnecessary)
>> 6. Keepalived uses a backup notify script to bring down the public
>> interfaces (unused)
>>
>> This is unexpected for the following reasons:
>>
>> 1. The keepalived notify script brings the public interfaces down when
>> transitioning to BACKUP - so how can we expect to run a VRRP instance over
>> eth2?
>> 2. If interfaces are down when transitioning to BACKUP, why are they
>> not expected to be down to begin with? (Before the router has become MASTER)
>> 3. Why are we running a VRRP instance over an interface with an IP that
>> will clash with another host on the network?
>>
>> The following method from the CsIP class in
>> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
>>
>> def check_is_up(self):
>> """ Ensure device is up """
>> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
>> for i in CsHelper.execute(cmd):
>> if " DOWN " in i:
>> cmd2 = "ip link set %s up" % self.getDevice()
>> # If redundant only bring up public interfaces that are
>> not eth1.
>> # Reason: private gateways are public interfaces.
>> # master.py and keepalived will deal with eth1 public
>> interface.
>> if self.cl.is_redundant() and (not self.is_public() or
>> self.getDevice() not in PUBLIC_INTERFACE):
>> CsHelper.execute(cmd2)
>> # if not redundant bring everything up
>> if not self.cl.is_redundant():
>> CsHelper.execute(cmd2)
>>
>> The comments refer to eth1 as a public interface when this is the link to
>> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1'].
>> But keepalived and master.py don't influence eth1 at all. This looks like a
>> mistake.
>>
>> Lastly, the logic of this line looks flawed:
>>
>> if self.cl.is_redundant() and (not self.is_public() or self.getDevice()
>> not in PUBLIC_INTERFACE)
>>
>> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will
>> be ignored. Public IPs will never be assigned to eth1, so this line
>> evaluates as:
>>
>>
>> if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)
>>
>> which reduces even further to:
>>
>> if self.cs.is_redundant() and self.is_control()
>>
>>
>> What would need doing
>> ---------------------
>>
>> 1. The keepalived.conf template would need to be changed to run the
>> VRRP instance over eth0.
>> 2. The check_is_up method of the CsIP class should be renamed to
>> 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass
>> is_public or needs_vrrp.
>> 3. The arpPing method should do nothing if the interface is down.
>> 4. The PUBLIC_INTERFACE constant should be either renamed or dropped
>> altogether.
>> 5. Other things that I haven't considered?
>>
>>
>> I'd really appreciate any feedback on this. It's possible that I've got
>> it all wrong but I'm suspecting not. I just don't want to tread on anyone's
>> toes by submitting a PR that goes against what appears to be an explicit
>> design decision.
>>
>>
>> Kind regards,
>>
>> Dean Close
>> iCloudHosting.com
>> http://www.icloudhosting.com
>> Tel: 01582 227927
>>
>> Unit 2, Smallmead Road, Reading RG2 0QS
>>
>> ******************************************************************
>> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
>> Commerce Ltd which is registered in England and Wales, Company Number
>> 04837714. Please use our trading address above for mail. Our registered
>> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading,
>> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you are not the intended recipient you are not authorised to
>> and must not disclose, copy, distribute, or retain this message or any part
>> of it.
>>
>> iCloudHosting accepts no responsibility for information, errors or
>> omissions in this email.
>> ******************************************************************
>>
>>
>

Re: Redundant Router Interfaces

Posted by Wei ZHOU <us...@gmail.com>.
actually not only this issue, but more
(1) restart vpc router will break the network if there are multiple tiers
in the vpc.
(2) check_heartbeat.sh does not work.

I will create tickets for them next Monday, and create PR on github.

-Wei

2016-04-01 21:14 GMT+02:00 Wei ZHOU <us...@gmail.com>:

> Dean,
>
> I just fixed it yesterday.
>
> the commit is
>
> ---
>
> diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> index 5f63c06..5256d03 100755
> --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> @@ -27,7 +27,7 @@ from CsRoute import CsRoute
>  from CsRule import CsRule
>
>  VRRP_TYPES = ['guest']
> -PUBLIC_INTERFACE = ['eth1']
> +VPC_PUBLIC_INTERFACE = ['eth1']
>
>  class CsAddress(CsDataBag):
>
> @@ -323,7 +323,7 @@ class CsIP:
>                  # If redundant only bring up public interfaces that are
> not eth1.
>                  # Reason: private gateways are public interfaces.
>                  # master.py and keepalived will deal with eth1 public
> interface.
> -                if self.cl.is_redundant() and (not self.is_public() or
> self.getDevice() not in PUBLIC_INTERFACE):
> +                if self.cl.is_redundant() and (not self.is_public() or
> (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
>                      CsHelper.execute(cmd2)
>                  # if not redundant bring everything up
>                  if not self.cl.is_redundant():
>
> ---
>
> - Wei
>
> 2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:
>
>> Hi guys,
>>
>> I had been investigating a possible bug with the way interfaces are
>> managed on virtual routers. The public interfaces are being brought up on
>> backup routers and (because they boot second) they arp the IPs away from
>> the master. I'd been examining an idea for a fix but whilst doing so I
>> found that the system appears to be designed to bring up these interfaces.
>>
>> I suspect that a few things need to be reworked - but the changes
>> necessary go so far against what has been implemented that I wanted to open
>> this up before doing the work.
>>
>> Hopefully if I go through my findings you guys can help me see what I
>> might be getting wrong.
>>
>> The following was correct for pre-4.6 redundant routers:
>>
>>   1. Both routers get configured with IP addresses, routes and iptables
>> rules.
>>   2. Public interfaces are initially set as DOWN.
>>   3. Keepalived runs a VRRP instance on the private interface (eth0) to
>> negotiate MASTER/BACKUP roles.
>>   4. Keepalived manages the virtual IP on eth0 used as the public gateway
>> for the guest VMs.
>>   5. Keepalived uses a master notify script to bring up the public
>> interfaces.
>>
>> The above was true for pre-4.6 routers. Now, however, things appear to
>> work differently:
>>
>>   1. Both routers get configured as before.
>>   2. All interfaces apart from eth1 (the Hypervisor-link interface) are
>> set as UP.
>>   3. Keepalived runs a VRRP instance on the first public interface (eth2)
>> to negotiate MASTER/BACKUP roles.
>>   4. Keepalived manages the virtual IP as before.
>>   5. Keepalived uses a master notify script to bring up the public
>> interfaces (unnecessary)
>>   6. Keepalived uses a backup notify script to bring down the public
>> interfaces (unused)
>>
>> This is unexpected for the following reasons:
>>
>>   1. The keepalived notify script brings the public interfaces down when
>> transitioning to BACKUP - so how can we expect to run a VRRP instance over
>> eth2?
>>   2. If interfaces are down when transitioning to BACKUP, why are they
>> not expected to be down to begin with? (Before the router has become MASTER)
>>   3. Why are we running a VRRP instance over an interface with an IP that
>> will clash with another host on the network?
>>
>> The following method from the CsIP class in
>> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
>>
>>     def check_is_up(self):
>>          """ Ensure device is up """
>>          cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
>>          for i in CsHelper.execute(cmd):
>>              if " DOWN " in i:
>>                  cmd2 = "ip link set %s up" % self.getDevice()
>>                  # If redundant only bring up public interfaces that are
>> not eth1.
>>                  # Reason: private gateways are public interfaces.
>>                  # master.py and keepalived will deal with eth1 public
>> interface.
>>                  if self.cl.is_redundant() and (not self.is_public() or
>> self.getDevice() not in PUBLIC_INTERFACE):
>>                      CsHelper.execute(cmd2)
>>                  # if not redundant bring everything up
>>                  if not self.cl.is_redundant():
>>                      CsHelper.execute(cmd2)
>>
>> The comments refer to eth1 as a public interface when this is the link to
>> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1'].
>> But keepalived and master.py don't influence eth1 at all. This looks like a
>> mistake.
>>
>> Lastly, the logic of this line looks flawed:
>>
>>   if self.cl.is_redundant() and (not self.is_public() or self.getDevice()
>> not in PUBLIC_INTERFACE)
>>
>> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will
>> be ignored. Public IPs will never be assigned to eth1, so this line
>> evaluates as:
>>
>>
>>   if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)
>>
>> which reduces even further to:
>>
>>   if self.cs.is_redundant() and self.is_control()
>>
>>
>> What would need doing
>> ---------------------
>>
>>   1. The keepalived.conf template would need to be changed to run the
>> VRRP instance over eth0.
>>   2. The check_is_up method of the CsIP class should be renamed to
>> 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass
>> is_public or needs_vrrp.
>>   3. The arpPing method should do nothing if the interface is down.
>>   4. The PUBLIC_INTERFACE constant should be either renamed or dropped
>> altogether.
>>   5. Other things that I haven't considered?
>>
>>
>> I'd really appreciate any feedback on this. It's possible that I've got
>> it all wrong but I'm suspecting not. I just don't want to tread on anyone's
>> toes by submitting a PR that goes against what appears to be an explicit
>> design decision.
>>
>>
>> Kind regards,
>>
>> Dean Close
>> iCloudHosting.com
>> http://www.icloudhosting.com
>> Tel: 01582 227927
>>
>> Unit 2, Smallmead Road, Reading RG2 0QS
>>
>> ******************************************************************
>> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
>> Commerce Ltd which is registered in England and Wales, Company Number
>> 04837714. Please use our trading address above for mail. Our registered
>> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading,
>> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you are not the intended recipient you are not authorised to
>> and must not disclose, copy, distribute, or retain this message or any part
>> of it.
>>
>> iCloudHosting accepts no responsibility for information, errors or
>> omissions in this email.
>> ******************************************************************
>>
>>
>

Re: Redundant Router Interfaces

Posted by Wei ZHOU <us...@gmail.com>.
Dean,

I just fixed it yesterday.

the commit is

---

diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
index 5f63c06..5256d03 100755
--- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
+++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
@@ -27,7 +27,7 @@ from CsRoute import CsRoute
 from CsRule import CsRule

 VRRP_TYPES = ['guest']
-PUBLIC_INTERFACE = ['eth1']
+VPC_PUBLIC_INTERFACE = ['eth1']

 class CsAddress(CsDataBag):

@@ -323,7 +323,7 @@ class CsIP:
                 # If redundant only bring up public interfaces that are
not eth1.
                 # Reason: private gateways are public interfaces.
                 # master.py and keepalived will deal with eth1 public
interface.
-                if self.cl.is_redundant() and (not self.is_public() or
self.getDevice() not in PUBLIC_INTERFACE):
+                if self.cl.is_redundant() and (not self.is_public() or
(self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
                     CsHelper.execute(cmd2)
                 # if not redundant bring everything up
                 if not self.cl.is_redundant():

---

- Wei

2016-04-01 20:13 GMT+02:00 Dean Close <de...@icloudhosting.com>:

> Hi guys,
>
> I had been investigating a possible bug with the way interfaces are
> managed on virtual routers. The public interfaces are being brought up on
> backup routers and (because they boot second) they arp the IPs away from
> the master. I'd been examining an idea for a fix but whilst doing so I
> found that the system appears to be designed to bring up these interfaces.
>
> I suspect that a few things need to be reworked - but the changes
> necessary go so far against what has been implemented that I wanted to open
> this up before doing the work.
>
> Hopefully if I go through my findings you guys can help me see what I
> might be getting wrong.
>
> The following was correct for pre-4.6 redundant routers:
>
>   1. Both routers get configured with IP addresses, routes and iptables
> rules.
>   2. Public interfaces are initially set as DOWN.
>   3. Keepalived runs a VRRP instance on the private interface (eth0) to
> negotiate MASTER/BACKUP roles.
>   4. Keepalived manages the virtual IP on eth0 used as the public gateway
> for the guest VMs.
>   5. Keepalived uses a master notify script to bring up the public
> interfaces.
>
> The above was true for pre-4.6 routers. Now, however, things appear to
> work differently:
>
>   1. Both routers get configured as before.
>   2. All interfaces apart from eth1 (the Hypervisor-link interface) are
> set as UP.
>   3. Keepalived runs a VRRP instance on the first public interface (eth2)
> to negotiate MASTER/BACKUP roles.
>   4. Keepalived manages the virtual IP as before.
>   5. Keepalived uses a master notify script to bring up the public
> interfaces (unnecessary)
>   6. Keepalived uses a backup notify script to bring down the public
> interfaces (unused)
>
> This is unexpected for the following reasons:
>
>   1. The keepalived notify script brings the public interfaces down when
> transitioning to BACKUP - so how can we expect to run a VRRP instance over
> eth2?
>   2. If interfaces are down when transitioning to BACKUP, why are they not
> expected to be down to begin with? (Before the router has become MASTER)
>   3. Why are we running a VRRP instance over an interface with an IP that
> will clash with another host on the network?
>
> The following method from the CsIP class in /opt/cloud/bin/cs/CsAddress.py
> confuses matters futher:
>
>     def check_is_up(self):
>          """ Ensure device is up """
>          cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
>          for i in CsHelper.execute(cmd):
>              if " DOWN " in i:
>                  cmd2 = "ip link set %s up" % self.getDevice()
>                  # If redundant only bring up public interfaces that are
> not eth1.
>                  # Reason: private gateways are public interfaces.
>                  # master.py and keepalived will deal with eth1 public
> interface.
>                  if self.cl.is_redundant() and (not self.is_public() or
> self.getDevice() not in PUBLIC_INTERFACE):
>                      CsHelper.execute(cmd2)
>                  # if not redundant bring everything up
>                  if not self.cl.is_redundant():
>                      CsHelper.execute(cmd2)
>
> The comments refer to eth1 as a public interface when this is the link to
> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1'].
> But keepalived and master.py don't influence eth1 at all. This looks like a
> mistake.
>
> Lastly, the logic of this line looks flawed:
>
>   if self.cl.is_redundant() and (not self.is_public() or self.getDevice()
> not in PUBLIC_INTERFACE)
>
> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will be
> ignored. Public IPs will never be assigned to eth1, so this line evaluates
> as:
>
>
>   if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)
>
> which reduces even further to:
>
>   if self.cs.is_redundant() and self.is_control()
>
>
> What would need doing
> ---------------------
>
>   1. The keepalived.conf template would need to be changed to run the VRRP
> instance over eth0.
>   2. The check_is_up method of the CsIP class should be renamed to
> 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass
> is_public or needs_vrrp.
>   3. The arpPing method should do nothing if the interface is down.
>   4. The PUBLIC_INTERFACE constant should be either renamed or dropped
> altogether.
>   5. Other things that I haven't considered?
>
>
> I'd really appreciate any feedback on this. It's possible that I've got it
> all wrong but I'm suspecting not. I just don't want to tread on anyone's
> toes by submitting a PR that goes against what appears to be an explicit
> design decision.
>
>
> Kind regards,
>
> Dean Close
> iCloudHosting.com
> http://www.icloudhosting.com
> Tel: 01582 227927
>
> Unit 2, Smallmead Road, Reading RG2 0QS
>
> ******************************************************************
> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
> Commerce Ltd which is registered in England and Wales, Company Number
> 04837714. Please use our trading address above for mail. Our registered
> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading,
> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you are not the intended recipient you are not authorised to and must
> not disclose, copy, distribute, or retain this message or any part of it.
>
> iCloudHosting accepts no responsibility for information, errors or
> omissions in this email.
> ******************************************************************
>
>