You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Rick Franchuk <ri...@transpect.net> on 1997/09/09 20:30:05 UTC

general/1107: Runaway httpd process under heavy load

>Number:         1107
>Category:       general
>Synopsis:       Runaway httpd process under heavy load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    apache (Apache HTTP Project)
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Tue Sep  9 11:30:02 1997
>Originator:     rickf@transpect.net
>Organization:
apache
>Release:        1.2.4
>Environment:
Linux www 2.0.29 #5 Sat Sep 6 12:27:17 CDT 1997 i586 (also on 2.0.30)
gcc version 2.7.2.1 (also 2.7.2.2)
>Description:
Under moderate to heavy loads (200+ open servers), apache servers will
periodically "Lock Up". I compiled with -g on and found that select seems to be
dying on select under heavy loads (possibly a result of insufficient FD's?)

Killing the process always restores the machine to full operation. The problem
in the code is a hard loop condition in http_main.c's child_main(), where if an error occurs
resulting in a srv<=0, execution IMMEDIATELY loops back to get another select, with causes another error, and so on.
>How-To-Repeat:
Under heavy loads running Linux, the problem happens with enough frequency to be
Real Damn Annoying(tm). Get a site doing 200+ simultaneous connections and theres
a good chance it'll happen at some point.
>Fix:
In line 1783 (http_main/child_main), change the 'continue' to an 'exit'. If one
SIGTERMs the runaway process, the undesirable behavior doesn't travel over to
other children (not for a while, anyways).
This is just a workaround. I think the problem lies within Linux itself
>Audit-Trail:
>Unformatted: