You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Tony Fica <tf...@sdf.org> on 2015/07/07 21:53:01 UTC

KVM Node running out of RAM

We have several KVM nodes running Cloudstack 4.4.2.  Sometimes an 
instance with X amount of RAM provisioned will be started on a host that 
has X+a small amount of RAM free. The kernel OOM killer will eventually 
kill off the instance.  Has anyone else seen this behavior, is there a 
way to reserve RAM for use by the host instead of by Cloudstack? Looking 
at the numbers in the database and the logs, Cloudstack is trying to use 
100% of the RAM on the host.

Any thoughts would be appreciated.

Thanks,

Tony Fica

Re: KVM Node running out of RAM

Posted by Logan Barfield <lb...@tqhosting.com>.

I'll look into the overcommit and OOM tweaks, thanks.

We've unfortunately had to disable KSM because it was causing
consistent network issues on our VMs.  There was some known kernel bug
that was patched, but then re-added as a regression.

I still think adding the "host" overcommit tunable in CloudStack would
be a good idea though.  I don't think there's any reason CloudStack
has to be naive about this type of thing.  Being able to set "virtual"
limits in one place makes it easier for admins to maintain headroom,
and I don't see there being many downsides (other than time spent
implementing it).

Thank You,

Logan Barfield
Tranquil Hosting

On Thu, Jul 9, 2015 at 5:51 AM, Nux! <nu...@li.nux.ro> wrote:
>> This is where i was going with this as well. Keep over-provisioning
>> level under 1 and cut of @ 85%.
>>
>>> - B) Add a new setting for a "host" level "cut off threshold" in
>>> addition to the existing "cluster" level threshold.
>> This would be a better safeguard. Also, KVM has a setting that would
>> restrict how much total memory it can use. Look for kernel argument
>> "mem_overcommit".
>>
>> You can stay under specific threshold to avoid OOMs..
>
> You can also tell the OOMK not to touch the KVM processes, like in this post I've just written:
> http://www.nux.ro/archive/2015/07/Protect_KVM_processes_from_OOM_killer.html
>
> In addition KSM (aka "memory deduplication" to simplify it) may help as well at the cost of extra CPU usage required for page scans/merges.
>
> HTH
> Lucian

Re: KVM Node running out of RAM

Posted by Nux! <nu...@li.nux.ro>.

> This is where i was going with this as well. Keep over-provisioning
> level under 1 and cut of @ 85%.
> 
>> - B) Add a new setting for a "host" level "cut off threshold" in
>> addition to the existing "cluster" level threshold.
> This would be a better safeguard. Also, KVM has a setting that would
> restrict how much total memory it can use. Look for kernel argument
> "mem_overcommit".
> 
> You can stay under specific threshold to avoid OOMs..

You can also tell the OOMK not to touch the KVM processes, like in this post I've just written:
http://www.nux.ro/archive/2015/07/Protect_KVM_processes_from_OOM_killer.html

In addition KSM (aka "memory deduplication" to simplify it) may help as well at the cost of extra CPU usage required for page scans/merges.

HTH
Lucian

Re: KVM Node running out of RAM

Posted by ilya <il...@gmail.com>.

Please see response in-line

On 7/8/15 7:51 AM, Logan Barfield wrote:
> We've run into this a few times as well.
>
> As I understand it the "cut off threshold" applies to a cluster, not
> an individual hypervisor.  So a KVM hypervisor can get filled up
> regardless (overcommit "1" still means "100%", and a hypervisor can
> fill up while a cluster is still < 85% full).
>
> Right now CloudStack doesn't always show an accurate memory count on
> the host (e.g., host with 128GB shows up as ~126GB).  The difference
> isn't because of "reserved" memory; it's due to how Java does the byte
> -> gigabyte math.
>
> KVM (as far as I've found) doesn't have "reserved" memory like Xen
> Dom0's do, so currently you have to keep tabs on memory usage outside
> of CloudStack.
>
> This becomes a bigger issue when you take guest disk caching into
> account, since even with plenty of actual overhead the hypervisor will
> quickly eat into swap with caching.  When you deploy a new VM the
> hypervisor (or Linux really) is SUPPOSED to immediately pull any spare
> memory from caching and allocate it to the new VM process, but in
> practice it doesn't always work as well as it should.  It's also much
> more common when deploying just a few VMs with a LOT of memory per
> host (e.g., host with 128GB of memory, 3 VMs with 32GB of memory + 1
> VM with 24GB for a total of 120GB).
>
> The solution as I see it is to implement one of the following:
> - A) Allow setting host "overprovisioning" ratios of less than "1"
> (e.g., .85).  This would effectively act as a "cap" on hypervisor
> resources.
This is where i was going with this as well. Keep over-provisioning 
level under 1 and cut of @ 85%.

> - B) Add a new setting for a "host" level "cut off threshold" in
> addition to the existing "cluster" level threshold.
This would be a better safeguard. Also, KVM has a setting that would 
restrict how much total memory it can use. Look for kernel argument 
"mem_overcommit".

You can stay under specific threshold to avoid OOMs..

>
> Thoughts?
>
> Thank You,
>
> Logan Barfield
> Tranquil Hosting
>
>
> On Wed, Jul 8, 2015 at 1:49 AM, ilya <il...@gmail.com> wrote:
>> Perhaps memory overcommit is set to greater than 1? What is your cut off
>> threshold? it usually set to 85%,  which means you always leave 15% of
>> memory in reserve.
>>
>> Also, is your mem baloon driver in guest VMs disabled?
>>
>> As a test, create some VMs that are slightly under your cut off threshold,
>> i.e. if you total is 256GB of RAM on the host and usable is 217GB (assumed
>> 85%), create 4 VMs with 50GB RAM.
>>
>> Next, install mprime or another memory load generator to allocate all of its
>> memory and see if OOM re-occurs on hypervisor.
>>
>> Regards
>> ilya
>>
>> On 7/7/15 12:53 PM, Tony Fica wrote:
>>> We have several KVM nodes running Cloudstack 4.4.2. Sometimes an instance
>>> with X amount of RAM provisioned will be started on a host that has X+a
>>> small amount of RAM free. The kernel OOM killer will eventually kill off the
>>> instance.  Has anyone else seen this behavior, is there a way to reserve RAM
>>> for use by the host instead of by Cloudstack? Looking at the numbers in the
>>> database and the logs, Cloudstack is trying to use 100% of the RAM on the
>>> host.
>>>
>>> Any thoughts would be appreciated.
>>>
>>> Thanks,
>>>
>>> Tony Fica
>>

Re: KVM Node running out of RAM

Posted by Logan Barfield <lb...@tqhosting.com>.

We've run into this a few times as well.

As I understand it the "cut off threshold" applies to a cluster, not
an individual hypervisor.  So a KVM hypervisor can get filled up
regardless (overcommit "1" still means "100%", and a hypervisor can
fill up while a cluster is still < 85% full).

Right now CloudStack doesn't always show an accurate memory count on
the host (e.g., host with 128GB shows up as ~126GB).  The difference
isn't because of "reserved" memory; it's due to how Java does the byte
-> gigabyte math.

KVM (as far as I've found) doesn't have "reserved" memory like Xen
Dom0's do, so currently you have to keep tabs on memory usage outside
of CloudStack.

This becomes a bigger issue when you take guest disk caching into
account, since even with plenty of actual overhead the hypervisor will
quickly eat into swap with caching.  When you deploy a new VM the
hypervisor (or Linux really) is SUPPOSED to immediately pull any spare
memory from caching and allocate it to the new VM process, but in
practice it doesn't always work as well as it should.  It's also much
more common when deploying just a few VMs with a LOT of memory per
host (e.g., host with 128GB of memory, 3 VMs with 32GB of memory + 1
VM with 24GB for a total of 120GB).

The solution as I see it is to implement one of the following:
- A) Allow setting host "overprovisioning" ratios of less than "1"
(e.g., .85).  This would effectively act as a "cap" on hypervisor
resources.
- B) Add a new setting for a "host" level "cut off threshold" in
addition to the existing "cluster" level threshold.

Thoughts?

Thank You,

Logan Barfield
Tranquil Hosting

On Wed, Jul 8, 2015 at 1:49 AM, ilya <il...@gmail.com> wrote:
> Perhaps memory overcommit is set to greater than 1? What is your cut off
> threshold? it usually set to 85%,  which means you always leave 15% of
> memory in reserve.
>
> Also, is your mem baloon driver in guest VMs disabled?
>
> As a test, create some VMs that are slightly under your cut off threshold,
> i.e. if you total is 256GB of RAM on the host and usable is 217GB (assumed
> 85%), create 4 VMs with 50GB RAM.
>
> Next, install mprime or another memory load generator to allocate all of its
> memory and see if OOM re-occurs on hypervisor.
>
> Regards
> ilya
>
> On 7/7/15 12:53 PM, Tony Fica wrote:
>>
>> We have several KVM nodes running Cloudstack 4.4.2. Sometimes an instance
>> with X amount of RAM provisioned will be started on a host that has X+a
>> small amount of RAM free. The kernel OOM killer will eventually kill off the
>> instance.  Has anyone else seen this behavior, is there a way to reserve RAM
>> for use by the host instead of by Cloudstack? Looking at the numbers in the
>> database and the logs, Cloudstack is trying to use 100% of the RAM on the
>> host.
>>
>> Any thoughts would be appreciated.
>>
>> Thanks,
>>
>> Tony Fica
>
>

Re: KVM Node running out of RAM

Posted by ilya <il...@gmail.com>.

Perhaps memory overcommit is set to greater than 1? What is your cut off 
threshold? it usually set to 85%,  which means you always leave 15% of 
memory in reserve.

Also, is your mem baloon driver in guest VMs disabled?

As a test, create some VMs that are slightly under your cut off 
threshold, i.e. if you total is 256GB of RAM on the host and usable is 
217GB (assumed 85%), create 4 VMs with 50GB RAM.

Next, install mprime or another memory load generator to allocate all of 
its memory and see if OOM re-occurs on hypervisor.

Regards
ilya
On 7/7/15 12:53 PM, Tony Fica wrote:
> We have several KVM nodes running Cloudstack 4.4.2. Sometimes an 
> instance with X amount of RAM provisioned will be started on a host 
> that has X+a small amount of RAM free. The kernel OOM killer will 
> eventually kill off the instance.  Has anyone else seen this behavior, 
> is there a way to reserve RAM for use by the host instead of by 
> Cloudstack? Looking at the numbers in the database and the logs, 
> Cloudstack is trying to use 100% of the RAM on the host.
>
> Any thoughts would be appreciated.
>
> Thanks,
>
> Tony Fica