You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Matthijs van der Klip <ma...@spill.nl> on 2004/07/08 11:16:14 UTC
[users@httpd] segmentation faults and zombie processes
Hi,
Hoping that someone can shed some light on the following (this has become
quite a long read I'm afraid):
ENVIRONMENT
-----------
I'm one of the administrators of a pool of seven webservers being
loadbalanced by LVS (http://linuxvirtualserver.org). Linux / Apache /
MySQL / PHP installation on all seven webservers is identical:
- RedHat Linux 8.0 with kernel 2.4.20-28.8smp
- Apache 1.3.31 with patches applied most of which originate from RedHat
(See below for a list of patches, and configure script)
- MySQL 4.0.20 client installed from RPM's from http://mysql.com
(shared library linked to PHP module)
- PHP 4.3.7 without patches applied compiled as a module
(See below for configure script)
Basically this consists of a customized Redhat 7.3 Apache 1.3.27
installation ported to RedHat 8.0 and upgraded to Apache 1.3.31.
At peak times (15:00 - 22:00) up to 100 (mostly PHP) requests per second
per server are handled. Server hardware mostly consists of Dual Xeon
2.4GHz machines which handle this kind of load with around 80% idle time
available. Memory consumption is close to ideal: all 2GB of RAM are in use
and no swapping is done.
RedHat supplied Apache patches (in order applied):
http://www.vdklip.nl/apache/apache_1.3.29-config.patch
http://www.vdklip.nl/apache/apache_1.3.31-apxs.patch
http://www.vdklip.nl/apache/apache_1.3.14-mkstemp.patch
http://www.vdklip.nl/apache/apache_1.3.14-redhat.patch
http://www.vdklip.nl/apache/apache_1.3.20-apachectl-init.patch
http://www.vdklip.nl/apache/apache_1.3.23-dbmdb.patch
http://www.vdklip.nl/apache/apache_1.3.27-db.patch
http://www.vdklip.nl/apache/apache_1.3.22-CAN-2003-0020.patch
Custom Apache patches (applied after RedHat patches):
http://www.vdklip.nl/apache/hard_server_limit.patch
http://www.vdklip.nl/apache/apachectl_status.patch
Apache configure script:
http://www.vdklip.nl/apache/apache.configure.txt
PHP configure script:
http://www.vdklip.nl/apache/php.configure.txt
PROBLEMS
--------
Our servers crash more often than desireable. This does not cause any
major interuptance to our websites because we're loadbalancing but it is a
pain in the ass for us administrators because we have not been able
to find the cause of these crashes. Servers just sit there being pingable
but no login can be done either through SSH or through the console. It is
what I would call a 'userland' crash. We suspect somehow no more processes
can be created. Sometimes a server can crash multiple times on one day, at
other times it might run for months uninterrupted. All seven servers have
experienced this kind of crash at some time though. After rebooting
nothing is to be found in the logs.
Recently we experienced some issues with Apache that point in the
direction of the crashes described above. We found out some Apache
processes end in a segmentation fault:
[Wed Jul 7 16:50:55 2004] [notice] child pid 3848 exit signal
Segmentation fault (11)
Restarting Apache (the hard way; doing a full stop and a clean start)
seemed to get rid of these segmentation faults for a while. While examing
where these segmentation faults originated we decided to add a special
case to our Apache monitoring script in the meantime. This monitoring
script runs every five minutes and checks the state of the Apache server.
To these checks we added that whenever a segmentation fault is found in
the last 10 lines of the main error log (which is seperated from our
virtualhost error logs), Apache will be restarted (again: the hard way).
On several occasions though this check (or rather it's acting on finding a
segmentation fault) has caused a server crash as described above. Order of
events:
1) Apache child process exits with segmentation fault. This is being
logged in the main error log:
[Wed Jul 7 16:50:55 2004] [notice] child pid 3848 exit signal
Segmentation fault (11)
2) Our monitoring script is being run by cron and finds the segmentation
fault in the log. Upon this it decides to restart the Apache server by
issueing a '/etc/rc.d/init.d/httpd restart' which sends a SIGTERM to
main httpd process and tries to start a new one.
3) Apache receives a SIGTERM signal and tries to shut down, but some child
processes are not willing to cooperate:
[Wed Jul 7 16:54:02 2004] [notice] caught SIGTERM, shutting down
[Wed Jul 7 16:59:02 2004] [warn] child process 4634 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4639 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4727 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4740 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4796 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4823 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4855 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4861 still did not exit,
sending a SIGTERM
[Wed Jul 7 16:59:02 2004] [warn] child process 4883 still did not exit,
sending a SIGTERM
I'm not sure how to explain the five minutes difference between
receival of the main SIGTERM signal and Apache warning it's child
processes did not exit. Can it be that Apache waits for exactly five
minutes? Alternatively it could be a second run of our monitoring
script, but that seems unlikely as we have not received output from
such a second run and on top of that there is no second '[notice]
caught SIGTERM, shutting down' message.
4) In the meantime the monitoring script tries to start a new Apache
instance:
[Wed Jul 7 16:54:03 2004] [notice] Apache/1.3.31 (Unix) (Red-Hat/Linux)
configured -- resuming normal operations
[Wed Jul 7 16:54:03 2004] [notice] Accept mutex: sysvsem (Default:
sysvsem)
It seems to succeed in this, although I have seen other occasions
where this ends in a 'Address already in use' message, meaning the
'old' Apache process did not exit before the 'new' one being started.
5) Something (Apache?) goes berserk an the server becomes very
unresponsive. Observed by the loadbalancer which starts removing and
adding this server as a result of the server sometimes responding to
requests and sometimes not.
6) Some time later (around 17:30) this is observed by an operator which
informs me. About a half hour later I try to login but the server does
not respond. Somewhat earlier the loadbalancer stopped adding the
server too, so it seems it has crashed.
7) I cycle power on the system and it is becomes available again:
[Wed Jul 7 18:24:07 2004] [notice] Apache/1.3.31 (Unix) (Red-Hat/Linux)
configured -- resuming normal operations
[Wed Jul 7 18:24:07 2004] [notice] Accept mutex: sysvsem (Default:
sysvsem)
8) Logging in again I monitor the server for a while and I begin noticing
some load spikes which coincide with some 'httpd <defunct>' processes
(zombies). They disappear after some seconds but do increase the load
noticeably.
At this point I'm open for any suggestion. Specifically I'm interested in:
1) Is my current Apache setup any good? Meaning:
a) Do the RedHat patches applied to the Apache 1.3.31 distribution make
any sense? They seem to do upon inspection by me, but I'd like to
hear from the experts.
b) It seems unlikely, but is there anything wrong with my
configuration?:
http://www.vdklip.nl/apache/httpd.conf
Virtualhosts are included but only contain 'VirtualHost *' sections.
2) I know a full stop and start isn't the recommended way of restarting
Apache, but it seems the only way when dealing with an unstable Apache
server (causing segmentation faults). Isn't it?
3) Is it normal to experience zombie httpd processes or shouldn't they
appear at all on a properly configured Apache server?
4) Is it normal to have some child processes which cannot be terminated by
Apache on the first occasion ('child process still did not exit'
warnings)?
5) It seems mostly likely (to me) the main cause of all this lies within
my PHP and not my Apache setup. I have seen some bugreports which
suggest to compile PHP with the 'enable-sigchild' option:
http://bugs.php.net/bug.php?id=6805
Does anyone on this list have any experience with this? I cannot find
(google) any information on this option which tells me exactly what it
does...
6) We're in progress of upgrading from RedHat 8.0 to Fedora Core 2 mainly
because of the inclusion of the 2.6 kernel. Anyone out there running
Apache 1.3 / PHP 4 on FC2 on large scale? Maybe we could exchange some
tips...
As said earlier, any help is appreciated. I hope this is a comprehensive
report and I did not leave anything out.
Best regards,
--
Matthijs van der Klip
System Administrator
Spill E-Projects
The Netherlands
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org