You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Marty Sweet <ms...@gmail.com> on 2014/02/23 14:20:46 UTC

Segfault: Top & Sampling Rates (kvm.resource.LibvirtComputingResource)

Hi,

I have just noticed the occasional following error messages in kern.log.
This is happening on all but 1 of my nodes. Is anyone else
experiencing this issue?
=====
Feb 23 06:53:24 aurora kernel: [10185338.400091] top[27631]: segfault
at 0 ip 00007f025eba3315 sp 00007fff3f9ed308 error 4 in
libc-2.15.so[7f025ea6f000+1b5000]
=====

I happened to have one of the nodes in trace mode, showing
cloudstack-agent is starting it:

/var/log/cloudstack/agent/agent.log
======
2014-02-23 06:53:23,654 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-1:null) Executing: /bin/bash -c idle=$(top -b -n
1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle
2014-02-23 06:53:23,661 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n
1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle
======

## This lead me on to find the following (potential) bug:

When running this manually I get the same result (2 secs before each command):
======
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n2 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 29.2%us,  1.1%sy,  0.0%ni, 69.1%id,  0.5%wa,  0.0%hi,  0.1%si,  0.0%st
=======
Apparently this is because:
"This is because top, vmstat, iostat all in their first run collect
data since the last reboot time of the system.
And the successive iterations run on the sampling period that you
specify. So, in the first run of top, you will see the %idle time
because from the time of reboot to the time of running top, it was
that much % idle. But in next iterations, since it is busy it doesn't
show any %idle.
Exclude the first iteration and try sampling over the interval you want."
http://serverfault.com/questions/436446/top-showing-64-idle-on-first-screen-or-batch-run-while-there-is-no-idle-time-a
========

Wouldn't this result in Cloudstack-Agent getting the wrong idle value
for the system?

If this hasn't been fixed in 4.3.0, I will create a patch along the
following lines (if others agree):
/bin/bash -c idle=$(top -d0.01 -b -n 2|grep Cpu\(s\):|tail -n1|cut -d%
-f4|cut -d, -f2;echo $idle
-> Where top -d0.01, changes the refresh interval so the command is
faster to complete.
-> tail -n1, get's the last line of the output (the latest idle value)

Ubuntu 12.04 / KVM / CS 4.2.0
Linux aurora 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7
16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks,
Marty