You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Roeland Kuipers (JIRA)" <ji...@apache.org> on 2013/09/04 19:58:56 UTC

[jira] [Created] (CLOUDSTACK-4607) Reboot router on out-of-memory vs OOM killer

Roeland Kuipers created CLOUDSTACK-4607:
-------------------------------------------

             Summary: Reboot router on out-of-memory vs OOM killer
                 Key: CLOUDSTACK-4607
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4607
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Virtual Router
    Affects Versions: 4.1.1
            Reporter: Roeland Kuipers


We have experienced a serious outage on redundant routing vm pair due to the OOM killer. Somehow the master node ran OoM and the OOM killer decided to kill random processes causing HAproxy to go down. But since keepalived was still running and functioning, a failover never happened. 
In our experience we rather panic on OOM instead of praying that the OOM-killer will do the right thing while it in 99% percent of the cases it just renders a machine useless. 
If this RvR would have panicked and rebooted we would have had a nice keepalived failure/failover without much impact on our customer.

To counter this scenario we rather see Panic and Reboot on an Out-Of-Memory condition instead of relying on the OOM killer which is a big gamble.

See also CLOUDSTACK-4605 and CLOUDSTACK-4606

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira