You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Chiradeep Vittal <Ch...@citrix.com> on 2013/12/02 18:55:08 UTC

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Been thinking of this a little more.
>From my experience with embedded programming, memory overcommit is not
usually allowed (the RAM is sized appropriately to the expected workload).
So, throwing this out there: should we set /proc/sys/vm/overcommit_memory
= 2 so that the kernel does not allow overcommit? This will ensure that
the user space tasks that cannot allocate more memory than available will
die.


On 9/6/13 2:35 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:

>Hi Alex and Chiradeep,
>
>@Alex: Yes it would work, but also means that everybody would have to
>implement this on a machine that runs syslog, and that it is not part of
>CloudStack, while I think it would be wonderful to have the SystemVM, as
>being an entity within CloudStack, combined with CloudStack to be
>self-sustaining, and not depend on an external scripts that do API calls.
>For the short term, yes it might be a viable solution, but in the long
>term it would feel kind of hack-ish ?
>
>@Chiradeep: I agree, it was also not acceptable for some of the guys on a
>linux kernel irc channel, and they had fair points, although I do believe
>people should have the option to choose. They pointed me towards kcrash,
>like I mentioned before. Yesterday I've tested kcrash and it works. It
>means that a bit of the memory will be used to load a crash kernel and an
>"adapted" init that does a poweroff at the moment the crash kernel is
>loaded, it also means we can save the core and analyze why it crashed
>before doing a power off if required. The watchdog functionality is
>something I found too, but I didn't feel comfortable with it somehow,
>I'll have a deeper look at it to see if it does the trick, so thanks for
>bringing it up!
>
>Cheers,
>
>Funs
>
>
>-----Original Message-----
>From: Alex Huang [mailto:Alex.Huang@citrix.com]
>Sent: vrijdag 6 september 2013 2:05
>To: dev@cloudstack.apache.org; Marcus Sorensen
>Cc: Roeland Kuipers; int-cloud
>Subject: RE: [DISCUSS] OOM killer and Routing/System VM's = :(
>
>If I recall correctly, oom actually prints something into syslog so a
>cron job that watches syslog and simply just shuts down the vm should
>work.
>
>--Alex
>
>> -----Original Message-----
>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>> Sent: Thursday, September 5, 2013 12:48 PM
>> To: dev@cloudstack.apache.org; Marcus Sorensen
>> Cc: Roeland Kuipers; int-cloud
>> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(
>> 
>> Maintaining a custom kernel is a big hassle, even if it is a few lines
>> of code change.
>> Can we do something in userspace? What about the software watchdog
>> that is available?
>> Along the lines of: http://goo.gl/oO3Lzr
>> http://linux.die.net/man/8/watchdog
>> 
>> 
>> On 9/5/13 7:13 AM, "Funs Kessen" <FK...@schubergphilis.com> wrote:
>> 
>> >
>> >> Well, you can't as far as I've looked in the source of panic.c. So
>> >>I'm thinking of  investigating of adding -1 as an option and seeing
>> >>if I can push halt in, let's hope  the guys that do kernel stuff
>> >>find this useful too.....
>> >>
>> >So it seems the patch, I conjured up for panic.c,  is seen as not so
>> >useful, there is however another way to achieve the same result. This
>> >would mean that we load a crash kernel with our own .sh script as
>> >init to do our bidding.
>> >
>> >Would that be a plan ?
>> >
>> >Cheers,
>> >
>> >Funs
>> >
>> >Sent from my iPhone
>> >
>> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <sh...@gmail.com>
>> wrote:
>> >
>> >> What would work as a quick fix for this sort of situation would be
>> >> if the machine could be configured to power off rather than
>> >> rebooting on oom. Then the HA system would restart the VM, applying
>>all configs.
>> >>
>> >> Anyone know how to do that? :-)
>> >>
>> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
>> >> <da...@gmail.com> wrote:
>> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>> >>>>
>> >>>> Hi Darren,
>> >>>>
>> >>>> Thanks for your reply! Could you share a bit more on your
>>plans/ideas?
>> >>>>
>> >>>> We also have been braining on other approaches of managing the
>> >>>> systemvm's, especially small customizations for specific tenants.
>> >>>> And maybe even leveraging a config mgmt tools like chef or puppet
>> >>>> with the ability to integrate CS with that in some way.
>> >>>
>> >>> I'll have to send the full details later but here's a rough idea.
>> >>> The basic approach is this.  Logical changes to the VRs (or system
>> >>>vms in general) get mapped to configuration items.  So add a LB
>> >>>rule maps to iptables config and haproxy config.  When you change a
>> >>>LB rule we then bump up the requested version of the configuration
>> >>>for iptables/haproxy.  So the requested version will be 4 maybe.
>> >>>The applied version will be 3 as the VR still has the old
>>configuration.
>> >>> Since 4 != 3, the VR will be signaled to pull the latest
>> >>>iptables/haproxy config.  So it will pull the configuration.  Say
>> >>>in the mean time somebody else adds four other LB rules.  So the
>> >>>requested version is now at 8.  So when the VR pulls the config it
>> >>>will get version 8, and then reply back saying it applied version 8.
>> >>> The applied version is now 8 which is greater than 4 (the version
>> >>>the  first LB rule change was waiting
>> >>> for) so basically all async jobs waiting for the LB change will be
>> >>>done.
>> >>>
>> >>> To pull the configuration from the VR, the VR will be hitting a
>> >>>templating configuration system.  So it pulls the full iptables and
>> >>>haproxy config.
>> >>> Not incremental changes.
>> >>>
>> >>> So if the VR ever reboots itself, it can easily just pull the
>> >>> latest config of everything and apply it.  So it will be consistent.
>> >>>
>> >>> I'd be interested to hear what type of customizations you would
>> >>>like to add.
>> >>> It will definitely be an extensible system, but the problem is if
>> >>>your extensions wants to touch the same configuration files that
>> >>>ACS wants to manage.  That gets a bit tricky as its really easy for
>> >>>each to break each other.  But I can definitely add some hooks that
>> >>>users can use to mess up things and "void the warranty."
>> >>>
>> >>> I've thought about chef and puppet for this, but basically it
>> >>>comes down to two things.  I'm really interested in this being fast
>> >>>and light weight.
>> >>> Ruby is neither of those.  So the core ACS stuff will probably
>> >>>remain  as very simple shell scripts.  Simple in that they really
>> >>>just need  to download configuration and restart services.  They
>> >>>know nothing  about the nature of the changes.  If, as an
>> >>>extension, you want to do  something with puppet, chef, I'd be open
>> >>>to that.  That's your
>> deal.
>> >>>
>> >>> This approach has many other benefits.  Like, for example, we can
>> >>> ensure that as we deploy a new ACS release existing system VMs can
>> >>> be updated (without a reboot, unless the kernel changes).
>> >>> Additionally, its fast and updates happen in near constant time.
>> >>> So most changes will be just a couple of seconds, even if you have
>> >>> 4000 LB
>> rules.
>> >>>
>> >>> Darren
>> >>>
>