You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Daan Hoogland <da...@shapeblue.com> on 2018/02/15 15:36:14 UTC

[PROPOSAL] reducing VR downtime on upgrade

The intention of this proposal is to have a way forward to reducing maintenance downtime for virtual routers. There are two parts to this proposal;

  1.  Dealing with legacy routers and replacing them before shutting down.
  2.  Unifying router embodiments and making use of redundancy mechanisms to quickly failover from old to new.

Ad .1 It will always be possible that a router is to old and will not be able to talk to a new version that is to replace it. This might be due to a keepalived update or replacement or just because it is very old. So though Unifying the routers and making them redundant enabled will solve a lot of use cases it will never deal with any conceivable situation, not even in systems upgraded to a version in which all intended functionality has been implemented. Dealing with any older router is to work as follows:

  1.  A check will be done to make sure the old VR is still up.
     *   If it is not there is no consideration it will be replaced as quickly as possible. Possible improvements here are the iptables configuration speedup and other generic optimisations unrelated to the upgrade itself.
     *   If it is there we need to walk on eggs with provisioning the new one😉
  2.  A new VR will be instantiated
  3.  Configuration data will be send but not applied.
  4.  The interfaces will be added and if need be brought down.
  5.  All configuration is applied
  6.  The old VR is killed
  7.  The interface on the new VR are brought up

Ad .2 This is a long-term goal. At the moment we have five (or debatably six) different incarnations of the virtual router:

  *   Basic zone dhcp server
  *   Shared network ‘router’
  *   VR
  *   rVR
  *   VPC
  *   rVPC
a first set of steps will be to reduce this to

  *   shared networks (where a basic zone is an automatic implementation of a single shared network in a zone)
  *   VR (which is always redundant enabled but may have only one instance)
  *   VPC (as above)
and then the next step is to unify VR and VPC as a VR is really only a VPC with just one network
the final step is then to unify a shared network with a VPC and this one is so far ahead that I don’t want to make too much statements about it now. We will have to find the exact implementation hazards that we will face in this step along the way. I think we are talking at least one year in when we reach this point.

As Shapeblue we will be starting a short PoC on the first part. We will try to figure out if the process under .1 is feasible, or that we need to wait configuring interfaces to the last moment and then do a ‘blind’ start.

daan.hoogland@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


Re: [PROPOSAL] reducing VR downtime on upgrade

Posted by Daan Hoogland <da...@gmail.com>.
Thanks Wido,
I suspect to start a PoC on fase 1 shortly.

On Wed, Feb 21, 2018 at 9:38 PM, Wido den Hollander <wi...@widodh.nl> wrote:

>
>
> On 02/15/2018 04:36 PM, Daan Hoogland wrote:
>
>> The intention of this proposal is to have a way forward to reducing
>> maintenance downtime for virtual routers. There are two parts to this
>> proposal;
>>
>>    1.  Dealing with legacy routers and replacing them before shutting
>> down.
>>    2.  Unifying router embodiments and making use of redundancy
>> mechanisms to quickly failover from old to new.
>>
>> Ad .1 It will always be possible that a router is to old and will not be
>> able to talk to a new version that is to replace it. This might be due to a
>> keepalived update or replacement or just because it is very old. So though
>> Unifying the routers and making them redundant enabled will solve a lot of
>> use cases it will never deal with any conceivable situation, not even in
>> systems upgraded to a version in which all intended functionality has been
>> implemented. Dealing with any older router is to work as follows:
>>
>>    1.  A check will be done to make sure the old VR is still up.
>>       *   If it is not there is no consideration it will be replaced as
>> quickly as possible. Possible improvements here are the iptables
>> configuration speedup and other generic optimisations unrelated to the
>> upgrade itself.
>>       *   If it is there we need to walk on eggs with provisioning the
>> new one😉
>>    2.  A new VR will be instantiated
>>    3.  Configuration data will be send but not applied.
>>    4.  The interfaces will be added and if need be brought down.
>>    5.  All configuration is applied
>>    6.  The old VR is killed
>>    7.  The interface on the new VR are brought up
>>
>>
> Looks good! We might want the VR to send out it's version as well over the
> local socket. Using that 'version' you could see if it supports various
> things.
>
> You could even have the VR send out 'features' so that you know what it's
> capable of.
>
> Ad .2 This is a long-term goal. At the moment we have five (or debatably
>> six) different incarnations of the virtual router:
>>
>>    *   Basic zone dhcp server
>>    *   Shared network ‘router’
>>    *   VR
>>    *   rVR
>>    *   VPC
>>    *   rVPC
>>
>
> Don't forget the metadata/password server it runs in almost all cases.
>
> Wido
>
>
> a first set of steps will be to reduce this to
>>
>>    *   shared networks (where a basic zone is an automatic implementation
>> of a single shared network in a zone)
>>    *   VR (which is always redundant enabled but may have only one
>> instance)
>>    *   VPC (as above)
>> and then the next step is to unify VR and VPC as a VR is really only a
>> VPC with just one network
>> the final step is then to unify a shared network with a VPC and this one
>> is so far ahead that I don’t want to make too much statements about it now.
>> We will have to find the exact implementation hazards that we will face in
>> this step along the way. I think we are talking at least one year in when
>> we reach this point.
>>
>> As Shapeblue we will be starting a short PoC on the first part. We will
>> try to figure out if the process under .1 is feasible, or that we need to
>> wait configuring interfaces to the last mo
>> <https://maps.google.com/?q=to+wait+configuring+interfaces+to+the+last+mo&entry=gmail&source=g>ment
>> and then do a ‘blind’ start.
>>
>> daan.hoogland@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>>
>>
>


-- 
Daan

Re: [PROPOSAL] reducing VR downtime on upgrade

Posted by Wido den Hollander <wi...@widodh.nl>.

On 02/15/2018 04:36 PM, Daan Hoogland wrote:
> The intention of this proposal is to have a way forward to reducing maintenance downtime for virtual routers. There are two parts to this proposal;
> 
>    1.  Dealing with legacy routers and replacing them before shutting down.
>    2.  Unifying router embodiments and making use of redundancy mechanisms to quickly failover from old to new.
> 
> Ad .1 It will always be possible that a router is to old and will not be able to talk to a new version that is to replace it. This might be due to a keepalived update or replacement or just because it is very old. So though Unifying the routers and making them redundant enabled will solve a lot of use cases it will never deal with any conceivable situation, not even in systems upgraded to a version in which all intended functionality has been implemented. Dealing with any older router is to work as follows:
> 
>    1.  A check will be done to make sure the old VR is still up.
>       *   If it is not there is no consideration it will be replaced as quickly as possible. Possible improvements here are the iptables configuration speedup and other generic optimisations unrelated to the upgrade itself.
>       *   If it is there we need to walk on eggs with provisioning the new one😉
>    2.  A new VR will be instantiated
>    3.  Configuration data will be send but not applied.
>    4.  The interfaces will be added and if need be brought down.
>    5.  All configuration is applied
>    6.  The old VR is killed
>    7.  The interface on the new VR are brought up
> 

Looks good! We might want the VR to send out it's version as well over 
the local socket. Using that 'version' you could see if it supports 
various things.

You could even have the VR send out 'features' so that you know what 
it's capable of.

> Ad .2 This is a long-term goal. At the moment we have five (or debatably six) different incarnations of the virtual router:
> 
>    *   Basic zone dhcp server
>    *   Shared network ‘router’
>    *   VR
>    *   rVR
>    *   VPC
>    *   rVPC

Don't forget the metadata/password server it runs in almost all cases.

Wido

> a first set of steps will be to reduce this to
> 
>    *   shared networks (where a basic zone is an automatic implementation of a single shared network in a zone)
>    *   VR (which is always redundant enabled but may have only one instance)
>    *   VPC (as above)
> and then the next step is to unify VR and VPC as a VR is really only a VPC with just one network
> the final step is then to unify a shared network with a VPC and this one is so far ahead that I don’t want to make too much statements about it now. We will have to find the exact implementation hazards that we will face in this step along the way. I think we are talking at least one year in when we reach this point.
> 
> As Shapeblue we will be starting a short PoC on the first part. We will try to figure out if the process under .1 is feasible, or that we need to wait configuring interfaces to the last moment and then do a ‘blind’ start.
> 
> daan.hoogland@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>    
>   
>