You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Roeland Kuipers <RK...@schubergphilis.com> on 2013/09/04 20:15:36 UTC

[DISCUSS] OOM killer and Routing/System VM's = :(

Hi Dev!

We have experienced a serious customers outage due to the OOM killer on a redundant routing vm pair member. Somehow the MASTER node ran Out of Memory and the OOM killer decided to kill random processes causing HAproxy to go down. But since keepalived was still running and functioning, a failover never happened.
In our experience we rather panic on OOM instead of praying that the OOM-killer will do the right thing while it in 99% percent of the cases it just renders a machine useless.
If this RvR would have panicked and rebooted we would have had a nice keepalived failure/failover without much impact on our customer.

So we figured to configure the following sysctl options:
vm.panic_on_oom = 1
kernel.panic_on_oops = 1
kernel.panic = 10

So that a VM panics and reboots after 10 seconds so a router just comes back in a happy state versus crippled by the OOM killer.

But we hit a problem here with VPC routers as their configuration is not persistent across reboots when they are rebooted outside cloudstack as they are not configured (entirely) using kernel parameters (/var/cache/cloud/cmdline). But only when started by Cloudstack.

It would be nice to see that the VPC router config is persistent across reboots even when rebooted outside cloudstack and using the same mechanism as the other system vm's to make things more consistent and reliable.

What is your opinion on this? Otherwise will add it to our backlog to contribute improvements in this area.

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Chiradeep Vittal <Ch...@citrix.com>.

Been thinking of this a little more.
>From my experience with embedded programming, memory overcommit is not
usually allowed (the RAM is sized appropriately to the expected workload).
So, throwing this out there: should we set /proc/sys/vm/overcommit_memory
= 2 so that the kernel does not allow overcommit? This will ensure that
the user space tasks that cannot allocate more memory than available will
die.


On 9/6/13 2:35 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:

>Hi Alex and Chiradeep,
>
>@Alex: Yes it would work, but also means that everybody would have to
>implement this on a machine that runs syslog, and that it is not part of
>CloudStack, while I think it would be wonderful to have the SystemVM, as
>being an entity within CloudStack, combined with CloudStack to be
>self-sustaining, and not depend on an external scripts that do API calls.
>For the short term, yes it might be a viable solution, but in the long
>term it would feel kind of hack-ish ?
>
>@Chiradeep: I agree, it was also not acceptable for some of the guys on a
>linux kernel irc channel, and they had fair points, although I do believe
>people should have the option to choose. They pointed me towards kcrash,
>like I mentioned before. Yesterday I've tested kcrash and it works. It
>means that a bit of the memory will be used to load a crash kernel and an
>"adapted" init that does a poweroff at the moment the crash kernel is
>loaded, it also means we can save the core and analyze why it crashed
>before doing a power off if required. The watchdog functionality is
>something I found too, but I didn't feel comfortable with it somehow,
>I'll have a deeper look at it to see if it does the trick, so thanks for
>bringing it up!
>
>Cheers,
>
>Funs
>
>
>-----Original Message-----
>From: Alex Huang [mailto:Alex.Huang@citrix.com]
>Sent: vrijdag 6 september 2013 2:05
>To: dev@cloudstack.apache.org; Marcus Sorensen
>Cc: Roeland Kuipers; int-cloud
>Subject: RE: [DISCUSS] OOM killer and Routing/System VM's = :(
>
>If I recall correctly, oom actually prints something into syslog so a
>cron job that watches syslog and simply just shuts down the vm should
>work.
>
>--Alex
>
>> -----Original Message-----
>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>> Sent: Thursday, September 5, 2013 12:48 PM
>> To: dev@cloudstack.apache.org; Marcus Sorensen
>> Cc: Roeland Kuipers; int-cloud
>> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(
>> 
>> Maintaining a custom kernel is a big hassle, even if it is a few lines
>> of code change.
>> Can we do something in userspace? What about the software watchdog
>> that is available?
>> Along the lines of: http://goo.gl/oO3Lzr
>> http://linux.die.net/man/8/watchdog
>> 
>> 
>> On 9/5/13 7:13 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:
>> 
>> >
>> >> Well, you can't as far as I've looked in the source of panic.c. So
>> >>I'm thinking of  investigating of adding -1 as an option and seeing
>> >>if I can push halt in, let's hope  the guys that do kernel stuff
>> >>find this useful too.....
>> >>
>> >So it seems the patch, I conjured up for panic.c,  is seen as not so
>> >useful, there is however another way to achieve the same result. This
>> >would mean that we load a crash kernel with our own .sh script as
>> >init to do our bidding.
>> >
>> >Would that be a plan ?
>> >
>> >Cheers,
>> >
>> >Funs
>> >
>> >Sent from my iPhone
>> >
>> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com>
>> wrote:
>> >
>> >> What would work as a quick fix for this sort of situation would be
>> >> if the machine could be configured to power off rather than
>> >> rebooting on oom. Then the HA system would restart the VM, applying
>>all configs.
>> >>
>> >> Anyone know how to do that? :-)
>> >>
>> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
>> >> <da...@gmail.com> wrote:
>> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>> >>>>
>> >>>> Hi Darren,
>> >>>>
>> >>>> Thanks for your reply! Could you share a bit more on your
>>plans/ideas?
>> >>>>
>> >>>> We also have been braining on other approaches of managing the
>> >>>> systemvm's, especially small customizations for specific tenants.
>> >>>> And maybe even leveraging a config mgmt tools like chef or puppet
>> >>>> with the ability to integrate CS with that in some way.
>> >>>
>> >>> I'll have to send the full details later but here's a rough idea.
>> >>> The basic approach is this.  Logical changes to the VRs (or system
>> >>>vms in general) get mapped to configuration items.  So add a LB
>> >>>rule maps to iptables config and haproxy config.  When you change a
>> >>>LB rule we then bump up the requested version of the configuration
>> >>>for iptables/haproxy.  So the requested version will be 4 maybe.
>> >>>The applied version will be 3 as the VR still has the old
>>configuration.
>> >>> Since 4 != 3, the VR will be signaled to pull the latest
>> >>>iptables/haproxy config.  So it will pull the configuration.  Say
>> >>>in the mean time somebody else adds four other LB rules.  So the
>> >>>requested version is now at 8.  So when the VR pulls the config it
>> >>>will get version 8, and then reply back saying it applied version 8.
>> >>> The applied version is now 8 which is greater than 4 (the version
>> >>>the  first LB rule change was waiting
>> >>> for) so basically all async jobs waiting for the LB change will be
>> >>>done.
>> >>>
>> >>> To pull the configuration from the VR, the VR will be hitting a
>> >>>templating configuration system.  So it pulls the full iptables and
>> >>>haproxy config.
>> >>> Not incremental changes.
>> >>>
>> >>> So if the VR ever reboots itself, it can easily just pull the
>> >>> latest config of everything and apply it.  So it will be consistent.
>> >>>
>> >>> I'd be interested to hear what type of customizations you would
>> >>>like to add.
>> >>> It will definitely be an extensible system, but the problem is if
>> >>>your extensions wants to touch the same configuration files that
>> >>>ACS wants to manage.  That gets a bit tricky as its really easy for
>> >>>each to break each other.  But I can definitely add some hooks that
>> >>>users can use to mess up things and "void the warranty."
>> >>>
>> >>> I've thought about chef and puppet for this, but basically it
>> >>>comes down to two things.  I'm really interested in this being fast
>> >>>and light weight.
>> >>> Ruby is neither of those.  So the core ACS stuff will probably
>> >>>remain  as very simple shell scripts.  Simple in that they really
>> >>>just need  to download configuration and restart services.  They
>> >>>know nothing  about the nature of the changes.  If, as an
>> >>>extension, you want to do  something with puppet, chef, I'd be open
>> >>>to that.  That's your
>> deal.
>> >>>
>> >>> This approach has many other benefits.  Like, for example, we can
>> >>> ensure that as we deploy a new ACS release existing system VMs can
>> >>> be updated (without a reboot, unless the kernel changes).
>> >>> Additionally, its fast and updates happen in near constant time.
>> >>> So most changes will be just a couple of seconds, even if you have
>> >>> 4000 LB
>> rules.
>> >>>
>> >>> Darren
>> >>>
>

RE: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Funs Kessen <FK...@schubergphilis.com>.

Hi Alex and Chiradeep,

@Alex: Yes it would work, but also means that everybody would have to implement this on a machine that runs syslog, and that it is not part of CloudStack, while I think it would be wonderful to have the SystemVM, as being an entity within CloudStack, combined with CloudStack to be self-sustaining, and not depend on an external scripts that do API calls. For the short term, yes it might be a viable solution, but in the long term it would feel kind of hack-ish ?

@Chiradeep: I agree, it was also not acceptable for some of the guys on a linux kernel irc channel, and they had fair points, although I do believe people should have the option to choose. They pointed me towards kcrash, like I mentioned before. Yesterday I've tested kcrash and it works. It  means that a bit of the memory will be used to load a crash kernel and an "adapted" init that does a poweroff at the moment the crash kernel is loaded, it also means we can save the core and analyze why it crashed before doing a power off if required. The watchdog functionality is something I found too, but I didn't feel comfortable with it somehow, I'll have a deeper look at it to see if it does the trick, so thanks for bringing it up!

Cheers,

Funs


-----Original Message-----
From: Alex Huang [mailto:Alex.Huang@citrix.com] 
Sent: vrijdag 6 september 2013 2:05
To: dev@cloudstack.apache.org; Marcus Sorensen
Cc: Roeland Kuipers; int-cloud
Subject: RE: [DISCUSS] OOM killer and Routing/System VM's = :(

If I recall correctly, oom actually prints something into syslog so a cron job that watches syslog and simply just shuts down the vm should work.

--Alex

> -----Original Message-----
> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> Sent: Thursday, September 5, 2013 12:48 PM
> To: dev@cloudstack.apache.org; Marcus Sorensen
> Cc: Roeland Kuipers; int-cloud
> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(
> 
> Maintaining a custom kernel is a big hassle, even if it is a few lines 
> of code change.
> Can we do something in userspace? What about the software watchdog 
> that is available?
> Along the lines of: http://goo.gl/oO3Lzr 
> http://linux.die.net/man/8/watchdog
> 
> 
> On 9/5/13 7:13 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:
> 
> >
> >> Well, you can't as far as I've looked in the source of panic.c. So 
> >>I'm thinking of  investigating of adding -1 as an option and seeing 
> >>if I can push halt in, let's hope  the guys that do kernel stuff 
> >>find this useful too.....
> >>
> >So it seems the patch, I conjured up for panic.c,  is seen as not so 
> >useful, there is however another way to achieve the same result. This 
> >would mean that we load a crash kernel with our own .sh script as 
> >init to do our bidding.
> >
> >Would that be a plan ?
> >
> >Cheers,
> >
> >Funs
> >
> >Sent from my iPhone
> >
> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com>
> wrote:
> >
> >> What would work as a quick fix for this sort of situation would be 
> >> if the machine could be configured to power off rather than 
> >> rebooting on oom. Then the HA system would restart the VM, applying all configs.
> >>
> >> Anyone know how to do that? :-)
> >>
> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd 
> >> <da...@gmail.com> wrote:
> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
> >>>>
> >>>> Hi Darren,
> >>>>
> >>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
> >>>>
> >>>> We also have been braining on other approaches of managing the 
> >>>> systemvm's, especially small customizations for specific tenants.
> >>>> And maybe even leveraging a config mgmt tools like chef or puppet 
> >>>> with the ability to integrate CS with that in some way.
> >>>
> >>> I'll have to send the full details later but here's a rough idea.
> >>> The basic approach is this.  Logical changes to the VRs (or system 
> >>>vms in general) get mapped to configuration items.  So add a LB 
> >>>rule maps to iptables config and haproxy config.  When you change a 
> >>>LB rule we then bump up the requested version of the configuration 
> >>>for iptables/haproxy.  So the requested version will be 4 maybe.  
> >>>The applied version will be 3 as the VR still has the old configuration.
> >>> Since 4 != 3, the VR will be signaled to pull the latest 
> >>>iptables/haproxy config.  So it will pull the configuration.  Say 
> >>>in the mean time somebody else adds four other LB rules.  So the 
> >>>requested version is now at 8.  So when the VR pulls the config it 
> >>>will get version 8, and then reply back saying it applied version 8.
> >>> The applied version is now 8 which is greater than 4 (the version 
> >>>the  first LB rule change was waiting
> >>> for) so basically all async jobs waiting for the LB change will be 
> >>>done.
> >>>
> >>> To pull the configuration from the VR, the VR will be hitting a 
> >>>templating configuration system.  So it pulls the full iptables and 
> >>>haproxy config.
> >>> Not incremental changes.
> >>>
> >>> So if the VR ever reboots itself, it can easily just pull the 
> >>> latest config of everything and apply it.  So it will be consistent.
> >>>
> >>> I'd be interested to hear what type of customizations you would 
> >>>like to add.
> >>> It will definitely be an extensible system, but the problem is if 
> >>>your extensions wants to touch the same configuration files that 
> >>>ACS wants to manage.  That gets a bit tricky as its really easy for 
> >>>each to break each other.  But I can definitely add some hooks that 
> >>>users can use to mess up things and "void the warranty."
> >>>
> >>> I've thought about chef and puppet for this, but basically it 
> >>>comes down to two things.  I'm really interested in this being fast 
> >>>and light weight.
> >>> Ruby is neither of those.  So the core ACS stuff will probably 
> >>>remain  as very simple shell scripts.  Simple in that they really 
> >>>just need  to download configuration and restart services.  They 
> >>>know nothing  about the nature of the changes.  If, as an 
> >>>extension, you want to do  something with puppet, chef, I'd be open 
> >>>to that.  That's your
> deal.
> >>>
> >>> This approach has many other benefits.  Like, for example, we can 
> >>> ensure that as we deploy a new ACS release existing system VMs can 
> >>> be updated (without a reboot, unless the kernel changes).
> >>> Additionally, its fast and updates happen in near constant time.  
> >>> So most changes will be just a couple of seconds, even if you have 
> >>> 4000 LB
> rules.
> >>>
> >>> Darren
> >>>

RE: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Alex Huang <Al...@citrix.com>.

If I recall correctly, oom actually prints something into syslog so a cron job that watches syslog and simply just shuts down the vm should work.

--Alex

> -----Original Message-----
> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> Sent: Thursday, September 5, 2013 12:48 PM
> To: dev@cloudstack.apache.org; Marcus Sorensen
> Cc: Roeland Kuipers; int-cloud
> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(
> 
> Maintaining a custom kernel is a big hassle, even if it is a few lines of code
> change.
> Can we do something in userspace? What about the software watchdog that
> is available?
> Along the lines of: http://goo.gl/oO3Lzr
> http://linux.die.net/man/8/watchdog
> 
> 
> On 9/5/13 7:13 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:
> 
> >
> >> Well, you can't as far as I've looked in the source of panic.c. So
> >>I'm thinking of  investigating of adding -1 as an option and seeing if
> >>I can push halt in, let's hope  the guys that do kernel stuff find
> >>this useful too.....
> >>
> >So it seems the patch, I conjured up for panic.c,  is seen as not so
> >useful, there is however another way to achieve the same result. This
> >would mean that we load a crash kernel with our own .sh script as init
> >to do our bidding.
> >
> >Would that be a plan ?
> >
> >Cheers,
> >
> >Funs
> >
> >Sent from my iPhone
> >
> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com>
> wrote:
> >
> >> What would work as a quick fix for this sort of situation would be if
> >> the machine could be configured to power off rather than rebooting on
> >> oom. Then the HA system would restart the VM, applying all configs.
> >>
> >> Anyone know how to do that? :-)
> >>
> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
> >> <da...@gmail.com> wrote:
> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
> >>>>
> >>>> Hi Darren,
> >>>>
> >>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
> >>>>
> >>>> We also have been braining on other approaches of managing the
> >>>> systemvm's, especially small customizations for specific tenants.
> >>>> And maybe even leveraging a config mgmt tools like chef or puppet
> >>>> with the ability to integrate CS with that in some way.
> >>>
> >>> I'll have to send the full details later but here's a rough idea.
> >>> The basic approach is this.  Logical changes to the VRs (or system
> >>>vms in general) get mapped to configuration items.  So add a LB rule
> >>>maps to iptables config and haproxy config.  When you change a LB
> >>>rule we then bump up the requested version of the configuration for
> >>>iptables/haproxy.  So the requested version will be 4 maybe.  The
> >>>applied version will be 3 as the VR still has the old configuration.
> >>> Since 4 != 3, the VR will be signaled to pull the latest
> >>>iptables/haproxy config.  So it will pull the configuration.  Say in
> >>>the mean time somebody else adds four other LB rules.  So the
> >>>requested version is now at 8.  So when the VR pulls the config it
> >>>will get version 8, and then reply back saying it applied version 8.
> >>> The applied version is now 8 which is greater than 4 (the version
> >>>the  first LB rule change was waiting
> >>> for) so basically all async jobs waiting for the LB change will be
> >>>done.
> >>>
> >>> To pull the configuration from the VR, the VR will be hitting a
> >>>templating configuration system.  So it pulls the full iptables and
> >>>haproxy config.
> >>> Not incremental changes.
> >>>
> >>> So if the VR ever reboots itself, it can easily just pull the latest
> >>> config of everything and apply it.  So it will be consistent.
> >>>
> >>> I'd be interested to hear what type of customizations you would like
> >>>to add.
> >>> It will definitely be an extensible system, but the problem is if
> >>>your extensions wants to touch the same configuration files that ACS
> >>>wants to manage.  That gets a bit tricky as its really easy for each
> >>>to break each other.  But I can definitely add some hooks that users
> >>>can use to mess up things and "void the warranty."
> >>>
> >>> I've thought about chef and puppet for this, but basically it comes
> >>>down to two things.  I'm really interested in this being fast and
> >>>light weight.
> >>> Ruby is neither of those.  So the core ACS stuff will probably
> >>>remain  as very simple shell scripts.  Simple in that they really
> >>>just need  to download configuration and restart services.  They know
> >>>nothing  about the nature of the changes.  If, as an extension, you
> >>>want to do  something with puppet, chef, I'd be open to that.  That's your
> deal.
> >>>
> >>> This approach has many other benefits.  Like, for example, we can
> >>> ensure that as we deploy a new ACS release existing system VMs can
> >>> be updated (without a reboot, unless the kernel changes).
> >>> Additionally, its fast and updates happen in near constant time.  So
> >>> most changes will be just a couple of seconds, even if you have 4000 LB
> rules.
> >>>
> >>> Darren
> >>>

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Chiradeep Vittal <Ch...@citrix.com>.

Maintaining a custom kernel is a big hassle, even if it is a few lines of
code change. 
Can we do something in userspace? What about the software watchdog that is
available?
Along the lines of: http://goo.gl/oO3Lzr
http://linux.die.net/man/8/watchdog


On 9/5/13 7:13 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:

>
>> Well, you can't as far as I've looked in the source of panic.c. So I'm
>>thinking of 
>> investigating of adding -1 as an option and seeing if I can push halt
>>in, let's hope 
>> the guys that do kernel stuff find this useful too.....
>>
>So it seems the patch, I conjured up for panic.c,  is seen as not so
>useful, there 
>is however another way to achieve the same result. This would mean that
>we 
>load a crash kernel with our own .sh script as init to do our bidding.
>
>Would that be a plan ?
>
>Cheers,
>
>Funs
>
>Sent from my iPhone
>
>On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
>> What would work as a quick fix for this sort of situation would be if
>> the machine could be configured to power off rather than rebooting on
>> oom. Then the HA system would restart the VM, applying all configs.
>> 
>> Anyone know how to do that? :-)
>> 
>> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
>> <da...@gmail.com> wrote:
>>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>>>> 
>>>> Hi Darren,
>>>> 
>>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
>>>> 
>>>> We also have been braining on other approaches of managing the
>>>> systemvm's, especially small customizations for specific tenants.
>>>> And maybe even leveraging a config mgmt tools like chef or puppet
>>>> with the ability to integrate CS with that in some way.
>>> 
>>> I'll have to send the full details later but here's a rough idea.
>>> The basic approach is this.  Logical changes to the VRs (or system
>>> vms in general) get mapped to configuration items.  So add a LB rule
>>> maps to iptables config and haproxy config.  When you change a LB
>>> rule we then bump up the requested version of the configuration for
>>> iptables/haproxy.  So the requested version will be 4 maybe.  The
>>> applied version will be 3 as the VR still has the old configuration.
>>> Since 4 != 3, the VR will be signaled to pull the latest
>>> iptables/haproxy config.  So it will pull the configuration.  Say in
>>> the mean time somebody else adds four other LB rules.  So the
>>> requested version is now at 8.  So when the VR pulls the config it
>>> will get version 8, and then reply back saying it applied version 8.
>>> The applied version is now 8 which is greater than 4 (the version the
>>> first LB rule change was waiting
>>> for) so basically all async jobs waiting for the LB change will be
>>>done.
>>> 
>>> To pull the configuration from the VR, the VR will be hitting a
>>> templating configuration system.  So it pulls the full iptables and
>>>haproxy config.
>>> Not incremental changes.
>>> 
>>> So if the VR ever reboots itself, it can easily just pull the latest
>>> config of everything and apply it.  So it will be consistent.
>>> 
>>> I'd be interested to hear what type of customizations you would like
>>>to add.
>>> It will definitely be an extensible system, but the problem is if
>>> your extensions wants to touch the same configuration files that ACS
>>> wants to manage.  That gets a bit tricky as its really easy for each
>>> to break each other.  But I can definitely add some hooks that users
>>> can use to mess up things and "void the warranty."
>>> 
>>> I've thought about chef and puppet for this, but basically it comes
>>> down to two things.  I'm really interested in this being fast and
>>>light weight.
>>> Ruby is neither of those.  So the core ACS stuff will probably remain
>>> as very simple shell scripts.  Simple in that they really just need
>>> to download configuration and restart services.  They know nothing
>>> about the nature of the changes.  If, as an extension, you want to do
>>> something with puppet, chef, I'd be open to that.  That's your deal.
>>> 
>>> This approach has many other benefits.  Like, for example, we can
>>> ensure that as we deploy a new ACS release existing system VMs can be
>>> updated (without a reboot, unless the kernel changes).  Additionally,
>>> its fast and updates happen in near constant time.  So most changes
>>> will be just a couple of seconds, even if you have 4000 LB rules.
>>> 
>>> Darren
>>>

RE: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Funs Kessen <FK...@schubergphilis.com>.

> Well, you can't as far as I've looked in the source of panic.c. So I'm thinking of 
> investigating of adding -1 as an option and seeing if I can push halt in, let's hope 
> the guys that do kernel stuff find this useful too.....
>
So it seems the patch, I conjured up for panic.c,  is seen as not so useful, there 
is however another way to achieve the same result. This would mean that we 
load a crash kernel with our own .sh script as init to do our bidding.

Would that be a plan ?

Cheers,

Funs

Sent from my iPhone

On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com> wrote:

> What would work as a quick fix for this sort of situation would be if 
> the machine could be configured to power off rather than rebooting on 
> oom. Then the HA system would restart the VM, applying all configs.
> 
> Anyone know how to do that? :-)
> 
> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd 
> <da...@gmail.com> wrote:
>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>>> 
>>> Hi Darren,
>>> 
>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
>>> 
>>> We also have been braining on other approaches of managing the 
>>> systemvm's, especially small customizations for specific tenants. 
>>> And maybe even leveraging a config mgmt tools like chef or puppet 
>>> with the ability to integrate CS with that in some way.
>> 
>> I'll have to send the full details later but here's a rough idea.  
>> The basic approach is this.  Logical changes to the VRs (or system 
>> vms in general) get mapped to configuration items.  So add a LB rule 
>> maps to iptables config and haproxy config.  When you change a LB 
>> rule we then bump up the requested version of the configuration for 
>> iptables/haproxy.  So the requested version will be 4 maybe.  The 
>> applied version will be 3 as the VR still has the old configuration.  
>> Since 4 != 3, the VR will be signaled to pull the latest 
>> iptables/haproxy config.  So it will pull the configuration.  Say in 
>> the mean time somebody else adds four other LB rules.  So the 
>> requested version is now at 8.  So when the VR pulls the config it 
>> will get version 8, and then reply back saying it applied version 8.  
>> The applied version is now 8 which is greater than 4 (the version the 
>> first LB rule change was waiting
>> for) so basically all async jobs waiting for the LB change will be done.
>> 
>> To pull the configuration from the VR, the VR will be hitting a 
>> templating configuration system.  So it pulls the full iptables and haproxy config.
>> Not incremental changes.
>> 
>> So if the VR ever reboots itself, it can easily just pull the latest 
>> config of everything and apply it.  So it will be consistent.
>> 
>> I'd be interested to hear what type of customizations you would like to add.
>> It will definitely be an extensible system, but the problem is if 
>> your extensions wants to touch the same configuration files that ACS 
>> wants to manage.  That gets a bit tricky as its really easy for each 
>> to break each other.  But I can definitely add some hooks that users 
>> can use to mess up things and "void the warranty."
>> 
>> I've thought about chef and puppet for this, but basically it comes 
>> down to two things.  I'm really interested in this being fast and light weight.
>> Ruby is neither of those.  So the core ACS stuff will probably remain 
>> as very simple shell scripts.  Simple in that they really just need 
>> to download configuration and restart services.  They know nothing 
>> about the nature of the changes.  If, as an extension, you want to do 
>> something with puppet, chef, I'd be open to that.  That's your deal.
>> 
>> This approach has many other benefits.  Like, for example, we can 
>> ensure that as we deploy a new ACS release existing system VMs can be 
>> updated (without a reboot, unless the kernel changes).  Additionally, 
>> its fast and updates happen in near constant time.  So most changes 
>> will be just a couple of seconds, even if you have 4000 LB rules.
>> 
>> Darren
>>

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Funs Kessen <FK...@schubergphilis.com>.

Well, you can't as far as I've looked in the source of panic.c. So I'm thinking of investigating of adding -1 as an option and seeing if I can push halt in, let's hope the guys that do kernel stuff find this useful too.....

Cheers,

Funs

Sent from my iPhone

On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com> wrote:

> What would work as a quick fix for this sort of situation would be if
> the machine could be configured to power off rather than rebooting on
> oom. Then the HA system would restart the VM, applying all configs.
> 
> Anyone know how to do that? :-)
> 
> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
> <da...@gmail.com> wrote:
>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>>> 
>>> Hi Darren,
>>> 
>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
>>> 
>>> We also have been braining on other approaches of managing the systemvm's,
>>> especially small customizations for specific tenants. And maybe even
>>> leveraging a config mgmt tools like chef or puppet with the ability to
>>> integrate CS with that in some way.
>> 
>> I'll have to send the full details later but here's a rough idea.  The basic
>> approach is this.  Logical changes to the VRs (or system vms in general) get
>> mapped to configuration items.  So add a LB rule maps to iptables config and
>> haproxy config.  When you change a LB rule we then bump up the requested
>> version of the configuration for iptables/haproxy.  So the requested version
>> will be 4 maybe.  The applied version will be 3 as the VR still has the old
>> configuration.  Since 4 != 3, the VR will be signaled to pull the latest
>> iptables/haproxy config.  So it will pull the configuration.  Say in the
>> mean time somebody else adds four other LB rules.  So the requested version
>> is now at 8.  So when the VR pulls the config it will get version 8, and
>> then reply back saying it applied version 8.  The applied version is now 8
>> which is greater than 4 (the version the first LB rule change was waiting
>> for) so basically all async jobs waiting for the LB change will be done.
>> 
>> To pull the configuration from the VR, the VR will be hitting a templating
>> configuration system.  So it pulls the full iptables and haproxy config.
>> Not incremental changes.
>> 
>> So if the VR ever reboots itself, it can easily just pull the latest config
>> of everything and apply it.  So it will be consistent.
>> 
>> I'd be interested to hear what type of customizations you would like to add.
>> It will definitely be an extensible system, but the problem is if your
>> extensions wants to touch the same configuration files that ACS wants to
>> manage.  That gets a bit tricky as its really easy for each to break each
>> other.  But I can definitely add some hooks that users can use to mess up
>> things and "void the warranty."
>> 
>> I've thought about chef and puppet for this, but basically it comes down to
>> two things.  I'm really interested in this being fast and light weight.
>> Ruby is neither of those.  So the core ACS stuff will probably remain as
>> very simple shell scripts.  Simple in that they really just need to download
>> configuration and restart services.  They know nothing about the nature of
>> the changes.  If, as an extension, you want to do something with puppet,
>> chef, I'd be open to that.  That's your deal.
>> 
>> This approach has many other benefits.  Like, for example, we can ensure
>> that as we deploy a new ACS release existing system VMs can be updated
>> (without a reboot, unless the kernel changes).  Additionally, its fast and
>> updates happen in near constant time.  So most changes will be just a couple
>> of seconds, even if you have 4000 LB rules.
>> 
>> Darren
>>

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Marcus Sorensen <sh...@gmail.com>.

What would work as a quick fix for this sort of situation would be if
the machine could be configured to power off rather than rebooting on
oom. Then the HA system would restart the VM, applying all configs.

Anyone know how to do that? :-)

On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
<da...@gmail.com> wrote:
> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>>
>> Hi Darren,
>>
>> Thanks for your reply! Could you share a bit more on your plans/ideas?
>>
>> We also have been braining on other approaches of managing the systemvm's,
>> especially small customizations for specific tenants. And maybe even
>> leveraging a config mgmt tools like chef or puppet with the ability to
>> integrate CS with that in some way.
>>
>
> I'll have to send the full details later but here's a rough idea.  The basic
> approach is this.  Logical changes to the VRs (or system vms in general) get
> mapped to configuration items.  So add a LB rule maps to iptables config and
> haproxy config.  When you change a LB rule we then bump up the requested
> version of the configuration for iptables/haproxy.  So the requested version
> will be 4 maybe.  The applied version will be 3 as the VR still has the old
> configuration.  Since 4 != 3, the VR will be signaled to pull the latest
> iptables/haproxy config.  So it will pull the configuration.  Say in the
> mean time somebody else adds four other LB rules.  So the requested version
> is now at 8.  So when the VR pulls the config it will get version 8, and
> then reply back saying it applied version 8.  The applied version is now 8
> which is greater than 4 (the version the first LB rule change was waiting
> for) so basically all async jobs waiting for the LB change will be done.
>
> To pull the configuration from the VR, the VR will be hitting a templating
> configuration system.  So it pulls the full iptables and haproxy config.
> Not incremental changes.
>
> So if the VR ever reboots itself, it can easily just pull the latest config
> of everything and apply it.  So it will be consistent.
>
> I'd be interested to hear what type of customizations you would like to add.
> It will definitely be an extensible system, but the problem is if your
> extensions wants to touch the same configuration files that ACS wants to
> manage.  That gets a bit tricky as its really easy for each to break each
> other.  But I can definitely add some hooks that users can use to mess up
> things and "void the warranty."
>
> I've thought about chef and puppet for this, but basically it comes down to
> two things.  I'm really interested in this being fast and light weight.
> Ruby is neither of those.  So the core ACS stuff will probably remain as
> very simple shell scripts.  Simple in that they really just need to download
> configuration and restart services.  They know nothing about the nature of
> the changes.  If, as an extension, you want to do something with puppet,
> chef, I'd be open to that.  That's your deal.
>
> This approach has many other benefits.  Like, for example, we can ensure
> that as we deploy a new ACS release existing system VMs can be updated
> (without a reboot, unless the kernel changes).  Additionally, its fast and
> updates happen in near constant time.  So most changes will be just a couple
> of seconds, even if you have 4000 LB rules.
>
> Darren
>

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Prasanna Santhanam <ts...@apache.org>.

On Wed, Sep 04, 2013 at 12:14:27PM -0700, Darren Shepherd wrote:
> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
> >Hi Darren,
> >
> >Thanks for your reply! Could you share a bit more on your plans/ideas?
> >
> >We also have been braining on other approaches of managing the
> >systemvm's, especially small customizations for specific tenants.
> >And maybe even leveraging a config mgmt tools like chef or puppet
> >with the ability to integrate CS with that in some way.
> >
> 
> I'll have to send the full details later but here's a rough idea.
> The basic approach is this.  Logical changes to the VRs (or system
> vms in general) get mapped to configuration items.  So add a LB rule
> maps to iptables config and haproxy config.  When you change a LB
> rule we then bump up the requested version of the configuration for
> iptables/haproxy.  So the requested version will be 4 maybe.  The
> applied version will be 3 as the VR still has the old configuration.
> Since 4 != 3, the VR will be signaled to pull the latest
> iptables/haproxy config.  So it will pull the configuration.  Say in
> the mean time somebody else adds four other LB rules.  So the
> requested version is now at 8.  So when the VR pulls the config it
> will get version 8, and then reply back saying it applied version 8.
> The applied version is now 8 which is greater than 4 (the version
> the first LB rule change was waiting for) so basically all async
> jobs waiting for the LB change will be done.
> 
> To pull the configuration from the VR, the VR will be hitting a
> templating configuration system.  So it pulls the full iptables and
> haproxy config.  Not incremental changes.
> 
> So if the VR ever reboots itself, it can easily just pull the latest
> config of everything and apply it.  So it will be consistent.
> 
> I'd be interested to hear what type of customizations you would like
> to add.  It will definitely be an extensible system, but the problem
> is if your extensions wants to touch the same configuration files
> that ACS wants to manage.  That gets a bit tricky as its really easy
> for each to break each other.  But I can definitely add some hooks
> that users can use to mess up things and "void the warranty."
> 
> I've thought about chef and puppet for this, but basically it comes
> down to two things.  I'm really interested in this being fast and
> light weight.  Ruby is neither of those.  So the core ACS stuff will
> probably remain as very simple shell scripts.  Simple in that they
> really just need to download configuration and restart services.
> They know nothing about the nature of the changes.  If, as an
> extension, you want to do something with puppet, chef, I'd be open
> to that.  That's your deal.

How about ansible?
https://github.com/ansible/ansible
No custom agents, plain ssh, doesn't need root acces. all
configuration is yaml based. you can extend in any language.

> 
> This approach has many other benefits.  Like, for example, we can
> ensure that as we deploy a new ACS release existing system VMs can
> be updated (without a reboot, unless the kernel changes).
> Additionally, its fast and updates happen in near constant time.  So
> most changes will be just a couple of seconds, even if you have 4000
> LB rules.
> 
> Darren

-- 
Prasanna.,

------------------------
Powered by BigRock.com

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Darren Shepherd <da...@gmail.com>.

On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
> Hi Darren,
>
> Thanks for your reply! Could you share a bit more on your plans/ideas?
>
> We also have been braining on other approaches of managing the systemvm's, especially small customizations for specific tenants. And maybe even leveraging a config mgmt tools like chef or puppet with the ability to integrate CS with that in some way.
>

I'll have to send the full details later but here's a rough idea.  The 
basic approach is this.  Logical changes to the VRs (or system vms in 
general) get mapped to configuration items.  So add a LB rule maps to 
iptables config and haproxy config.  When you change a LB rule we then 
bump up the requested version of the configuration for iptables/haproxy. 
  So the requested version will be 4 maybe.  The applied version will be 
3 as the VR still has the old configuration.  Since 4 != 3, the VR will 
be signaled to pull the latest iptables/haproxy config.  So it will pull 
the configuration.  Say in the mean time somebody else adds four other 
LB rules.  So the requested version is now at 8.  So when the VR pulls 
the config it will get version 8, and then reply back saying it applied 
version 8.  The applied version is now 8 which is greater than 4 (the 
version the first LB rule change was waiting for) so basically all async 
jobs waiting for the LB change will be done.

To pull the configuration from the VR, the VR will be hitting a 
templating configuration system.  So it pulls the full iptables and 
haproxy config.  Not incremental changes.

So if the VR ever reboots itself, it can easily just pull the latest 
config of everything and apply it.  So it will be consistent.

I'd be interested to hear what type of customizations you would like to 
add.  It will definitely be an extensible system, but the problem is if 
your extensions wants to touch the same configuration files that ACS 
wants to manage.  That gets a bit tricky as its really easy for each to 
break each other.  But I can definitely add some hooks that users can 
use to mess up things and "void the warranty."

I've thought about chef and puppet for this, but basically it comes down 
to two things.  I'm really interested in this being fast and light 
weight.  Ruby is neither of those.  So the core ACS stuff will probably 
remain as very simple shell scripts.  Simple in that they really just 
need to download configuration and restart services.  They know nothing 
about the nature of the changes.  If, as an extension, you want to do 
something with puppet, chef, I'd be open to that.  That's your deal.

This approach has many other benefits.  Like, for example, we can ensure 
that as we deploy a new ACS release existing system VMs can be updated 
(without a reboot, unless the kernel changes).  Additionally, its fast 
and updates happen in near constant time.  So most changes will be just 
a couple of seconds, even if you have 4000 LB rules.

Darren

RE: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Roeland Kuipers <RK...@schubergphilis.com>.

Hi Darren,

Thanks for your reply! Could you share a bit more on your plans/ideas? 

We also have been braining on other approaches of managing the systemvm's, especially small customizations for specific tenants. And maybe even leveraging a config mgmt tools like chef or puppet with the ability to integrate CS with that in some way.

Cheers,
Roeland

-----Original Message-----
From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com] 
Sent: woensdag 4 september 2013 20:30
To: dev@cloudstack.apache.org
Cc: Roeland Kuipers; int-cloud
Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(

On 09/04/2013 11:15 AM, Roeland Kuipers wrote:
> It would be nice to see that the VPC router config is persistent across reboots even when rebooted outside cloudstack and using the same mechanism as the other system vm's to make things more consistent and reliable.
>
> What is your opinion on this? Otherwise will add it to our backlog to contribute improvements in this area.

This isn't terribly helpful for your immediate customer issue, but I'm in the process of putting together a proposal for a new approach to managing the System VMs.  Lots of things will change, but it will cover the the case where the VM reboots itself or somebody does it from the hypervisor.  The VM will come back up and its current configuration will be restored.

It is a rather ambitious change so don't expect to see it done for a couple months.

Darren

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Darren Shepherd <da...@gmail.com>.

On 09/04/2013 11:15 AM, Roeland Kuipers wrote:
> It would be nice to see that the VPC router config is persistent across reboots even when rebooted outside cloudstack and using the same mechanism as the other system vm's to make things more consistent and reliable.
>
> What is your opinion on this? Otherwise will add it to our backlog to contribute improvements in this area.

This isn't terribly helpful for your immediate customer issue, but I'm 
in the process of putting together a proposal for a new approach to 
managing the System VMs.  Lots of things will change, but it will cover 
the the case where the VM reboots itself or somebody does it from the 
hypervisor.  The VM will come back up and its current configuration will 
be restored.

It is a rather ambitious change so don't expect to see it done for a 
couple months.

Darren

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Posted by Chiradeep Vittal <Ch...@citrix.com>.

I'd support adding these parameters in some form in
/etc/init.d/cloud-early-config. Agree that OOM killer is of no use.

On 9/4/13 11:15 AM, "Roeland Kuipers" <RK...@schubergphilis.com> wrote:

>Hi Dev!
>
>We have experienced a serious customers outage due to the OOM killer on a
>redundant routing vm pair member. Somehow the MASTER node ran Out of
>Memory and the OOM killer decided to kill random processes causing
>HAproxy to go down. But since keepalived was still running and
>functioning, a failover never happened.
>In our experience we rather panic on OOM instead of praying that the
>OOM-killer will do the right thing while it in 99% percent of the cases
>it just renders a machine useless.
>If this RvR would have panicked and rebooted we would have had a nice
>keepalived failure/failover without much impact on our customer.
>
>So we figured to configure the following sysctl options:
>        vm.panic_on_oom = 1
>        kernel.panic_on_oops = 1
>        kernel.panic = 10
>
>So that a VM panics and reboots after 10 seconds so a router just comes
>back in a happy state versus crippled by the OOM killer.
>
>But we hit a problem here with VPC routers as their configuration is not
>persistent across reboots when they are rebooted outside cloudstack as
>they are not configured (entirely) using kernel parameters
>(/var/cache/cloud/cmdline). But only when started by Cloudstack.
>
>It would be nice to see that the VPC router config is persistent across
>reboots even when rebooted outside cloudstack and using the same
>mechanism as the other system vm's to make things more consistent and
>reliable.
>
>What is your opinion on this? Otherwise will add it to our backlog to
>contribute improvements in this area.
>
>See also:
>
>https://issues.apache.org/jira/browse/CLOUDSTACK-4605
>https://issues.apache.org/jira/browse/CLOUDSTACK-4606
>https://issues.apache.org/jira/browse/CLOUDSTACK-4607
>
>
>Thanks & Cheers,
>Roeland Kuipers
>
>
>