You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2002/07/02 23:34:46 UTC
DO NOT REPLY [Bug 10426] New: - load average high when httpd doing nothing

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10426>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10426

load average high when httpd doing nothing

           Summary: load average high when httpd doing nothing
           Product: Apache httpd-2.0
           Version: 2.0.39
          Platform: PC
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Platform
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: pab@balancewd.com


On a lightly-loaded system, httpd's load average goes up by 1 permanently
for each httpd process that handles any HTTP traffic at all.

When you start the daemon, and it forks the minimum # httpd's, the load is near 
zero.
After you fetch any document, such as /index.html, that 1 child httpd process
begins raising the load average by 1.00 from then on, even with no other 
connections in-coming on any of the daemons.  If you do it again and get a 
different child httpd, that one does the same thing, now the load is around 
2.00, and so on.

Top and PS do not show any processes using a lot of CPU.  The available CPU is 
always 95-100%, which is really weird!  If you let it stay this way for 12 
hours, with the load high, the amount of accumulated CPU per process is very 
low (less than 1 minute).  If these processes were really using up CPU they 
should at least have used an hour or two of CPU time each!

If you SIGHUP the main httpd, it kills & restarts its children, so the load 
drops back down to around 0.00.  Same if you completely kill & restart them.

No errors about this go to syslog or to apache's error_log.
The system is completely usable as a web server, and logging in
and typing in a telnet window feels fine
(it doesn't feel like a load of 5.00).
The web server is fully functional, from what we can see, only the load is high.

Operating System: BSDI 4.1 with all patches up to date
Platform: Compaq DL380 (Pentium III)

Doing a Ktrace yields some interesting info, it looks like some
sort of SysV semaphore issue.  A Ktrace of a "good" child httpd
(before the problem occurs):

 11781 httpd    CALL  sigprocmask(0x3,0)
 11781 httpd    RET   sigprocmask -65809/0xfffefeef
 11781 httpd    CALL  gettimeofday(0x8047570,0)
 11781 httpd    RET   gettimeofday 0
 11781 httpd    CALL  setitimer(0,0x8047568,0)
 11781 httpd    RET   setitimer 0
 11781 httpd    CALL  sigreturn(0)
 11781 httpd    RET   sigreturn JUSTRETURN
 11781 httpd    PSIG  SIGalrm caught handler=0x281c5d20 mask=0x0 code=0x0
 11781 httpd    CALL  sigprocmask(0x3,0)
 11781 httpd    RET   sigprocmask -65809/0xfffefeef
 11781 httpd    CALL  gettimeofday(0x8047570,0)
 11781 httpd    RET   gettimeofday 0
 11781 httpd    CALL  setitimer(0,0x8047568,0)
 11781 httpd    RET   setitimer 0
 11781 httpd    CALL  sigreturn(0)
 11781 httpd    RET   sigreturn JUSTRETURN
 11781 httpd    PSIG  SIGalrm caught handler=0x281c5d20 mask=0x0 code=0x0


And now, after you ask for 1 document and the load goes up,
here's the same Ktrace on a httpd child process:

 11766 httpd    CALL  sigprocmask(0x3,0)
 11766 httpd    RET   sigprocmask -65809/0xfffefeef
 11766 httpd    CALL  gettimeofday(0x8047910,0)
 11766 httpd    RET   gettimeofday 0
 11766 httpd    CALL  setitimer(0,0x8047908,0)
 11766 httpd    RET   setitimer 0
 11766 httpd    CALL  sigreturn(0)
 11766 httpd    RET   sigreturn JUSTRETURN
 11766 httpd    CALL  semop(0xd0000,0x280faf1c,0x1)
 11766 httpd    PSIG  SIGalrm caught handler=0x281c5d20 mask=0x0 code=0x0
 11766 httpd    RET   semop -1 errno 4 Interrupted system call
 11766 httpd    CALL  sigprocmask(0x3,0)
 11766 httpd    RET   sigprocmask -65809/0xfffefeef
 11766 httpd    CALL  gettimeofday(0x8047910,0)
 11766 httpd    RET   gettimeofday 0
 11766 httpd    CALL  setitimer(0,0x8047908,0)
 11766 httpd    RET   setitimer 0
 11766 httpd    CALL  sigreturn(0)
 11766 httpd    RET   sigreturn JUSTRETURN
 11766 httpd    CALL  semop(0xd0000,0x280faf1c,0x1)
 11766 httpd    PSIG  SIGalrm caught handler=0x281c5d20 mask=0x0 code=0x0
 11766 httpd    RET   semop -1 errno 4 Interrupted system call


Searching groups.google.com for "apache bsdi load" shows that some people
were having our very same problem back in 1997, with Apache 1.0 and 1.1.
I couldn't find any messages newer than about 1998 reporting this problem.
No real resolution was listed, but someone recommended the Ktrace idea above.

>From the ktrace, it looks like the itimer is going off (maybe semop() is 
locking up indefinitely?) which sends SIGALRM to the process, which interrupts 
the semop().   In the "good" output, above, you can see the semop()'s are 
finishing just fine without having to be interrupted by SIGALRM.
So maybe a semaphore (lock) is not being unlocked?


We did not experience this issue with Apache 1.3.9 on this platform,
which we use on 50+ systems today.

We are going to try Apache 1.3.<latest> next.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org